training_args = TrainingArguments( output_dir='t5-small-wikilarge-newsela-with-domain-adaptation', num_train_epochs=20, warmup_steps=250, per_device_train_batch_size=BATCH_SIZE, weight_decay=0.01, learning_rate=2e-4, # fp16=True, optim="adafactor", )

Step Training Loss 500 3.313300 1000 2.914200 1500 2.848100 2000 2.811700 2500 2.789700 3000 2.771400 3500 2.761500 4000 2.749600 4500 2.732300 5000 2.729400 5500 2.717600 6000 2.703000 6500 2.699100 7000 2.686200 7500 2.681000 8000 2.679300 8500 2.667100 9000 2.656400 9500 2.656200 10000 2.645100 10500 2.648600 11000 2.638400 11500 2.636200 12000 2.633500 12500 2.632400 13000 2.622300 13500 2.624400 14000 2.618200 14500 2.614300 15000 2.616600 15500 2.610700 16000 2.613600 TrainOutput(global_step=16060, training_loss=2.709684318266948, metrics={'train_runtime': 3869.7584, 'train_samples_per_second': 530.839, 'train_steps_per_second': 4.15, 'total_flos': 0.0, 'train_loss': 2.709684318266948, 'epoch': 20.0})

sari	bleu

0 36.778905 38.143843