<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mbartLarge_koja_37p_exp2
This model is a fine-tuned version of facebook/mbart-large-50-many-to-many-mmt on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.8988
- Bleu: 6.7577
- Gen Len: 17.8104
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 350
- num_epochs: 15
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
2.0622 | 0.11 | 1250 | 1.6679 | 1.2834 | 17.8009 |
1.5139 | 0.22 | 2500 | 1.4378 | 2.0427 | 17.8496 |
1.4121 | 0.33 | 3750 | 1.3116 | 2.7599 | 17.7667 |
1.2879 | 0.44 | 5000 | 1.2381 | 3.1444 | 17.8887 |
1.2344 | 0.55 | 6250 | 1.1769 | 3.3835 | 17.8323 |
1.1778 | 0.66 | 7500 | 1.1382 | 3.9511 | 17.4892 |
1.1461 | 0.77 | 8750 | 1.0938 | 3.9402 | 18.0136 |
1.1151 | 0.88 | 10000 | 1.0749 | 4.2134 | 18.0537 |
1.093 | 0.99 | 11250 | 1.0418 | 3.9587 | 17.8715 |
1.0626 | 1.1 | 12500 | 1.0315 | 4.6251 | 17.9406 |
1.0192 | 1.21 | 13750 | 1.0132 | 4.9573 | 18.1266 |
0.9957 | 1.32 | 15000 | 0.9989 | 4.3068 | 18.0925 |
0.9778 | 1.43 | 16250 | 0.9850 | 5.0517 | 17.8783 |
0.9446 | 1.54 | 17500 | 0.9748 | 5.0194 | 17.9348 |
0.9236 | 1.65 | 18750 | 0.9619 | 4.6011 | 17.7926 |
0.9091 | 1.76 | 20000 | 0.9564 | 4.6035 | 17.9399 |
0.9072 | 1.87 | 21250 | 0.9533 | 4.8313 | 17.6221 |
0.8758 | 1.98 | 22500 | 0.9421 | 5.2707 | 17.5851 |
0.8539 | 2.09 | 23750 | 0.9304 | 5.2661 | 17.821 |
0.8575 | 2.2 | 25000 | 0.9329 | 4.9143 | 17.8879 |
0.8314 | 2.31 | 26250 | 0.9262 | 5.106 | 18.0037 |
0.8248 | 2.42 | 27500 | 0.9241 | 5.3073 | 17.6632 |
0.8151 | 2.53 | 28750 | 0.9302 | 5.5675 | 17.7676 |
0.8093 | 2.64 | 30000 | 0.9149 | 6.2644 | 17.8475 |
0.7691 | 2.75 | 31250 | 0.8988 | 6.6682 | 17.7685 |
0.771 | 2.86 | 32500 | 0.9189 | 5.7856 | 17.8678 |
0.7658 | 2.97 | 33750 | 0.9175 | 6.2468 | 17.7313 |
0.7914 | 3.08 | 35000 | 0.9020 | 5.5525 | 17.7627 |
0.7264 | 3.19 | 36250 | 0.9046 | 6.2055 | 17.7662 |
Framework versions
- Transformers 4.34.1
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1