tst-translation-output2

This model is a fine-tuned version of mbart-large-cc25 on an custom dataset. It achieves the following results on the evaluation set:

Loss: 3.4005
Bleu: 26.0229
Gen Len: 15.1659

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
4.5769	1.15	1000	4.0805	14.9483	30.0618
2.8098	2.31	2000	3.0612	19.7963	16.7121
1.7974	3.46	3000	2.8258	21.7059	15.5179
1.1474	4.62	4000	2.6951	22.4801	16.6382
0.8042	5.77	5000	2.7272	22.4419	15.1393
0.5605	6.93	6000	2.8239	23.1096	15.6457
0.3857	8.08	7000	2.9448	24.2536	15.1538
...
0.0042	40.42	35000	3.3485	25.2464	15.2387
0.0029	41.57	36000	3.3744	25.2885	15.1306
0.0026	42.73	37000	3.3947	25.9359	15.1896
0.0024	43.88	38000	3.3699	25.5309	15.2671
0.0022	45.03	39000	3.3947	25.2932	15.1387
0.0011	46.19	40000	3.4075	25.7551	15.1231
0.001	47.34	41000	3.3918	25.6345	15.1243
0.0007	48.5	42000	3.4063	25.7209	15.111
0.0006	49.65	43000	3.4003	25.9227	15.1873

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu117
Datasets 2.14.4
Tokenizers 0.13.3