zh-kr_mid

This model is a fine-tuned version of facebook/mbart-large-cc25 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5557
Bleu: 16.6036
Gen Len: 15.4901

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
2.7248	0.75	1000	1.9410	3.2381	48.6095
1.5683	1.5	2000	1.6889	10.2345	20.4433
1.1916	2.25	3000	1.6843	13.4571	18.8854
1.068	2.99	4000	1.6390	15.6862	15.5054
0.7313	3.74	5000	1.7003	15.2014	16.5938
0.4832	4.49	6000	1.8982	15.0381	16.9068
0.3862	5.24	7000	2.1426	15.5397	15.6451
0.3675	5.99	8000	2.1168	15.8847	15.6926
0.2627	6.74	9000	2.2603	16.3603	15.9671
0.1955	7.49	10000	2.4114	15.7447	15.979
0.171	8.23	11000	2.5141	15.7852	15.9244
0.1702	8.98	12000	2.5557	16.6036	15.4901
0.1298	9.73	13000	2.6536	16.1319	15.5492
0.1052	10.48	14000	2.7586	16.1807	15.8884
0.2268	11.23	15000	2.7258	15.1752	15.5346
0.1327	11.98	16000	2.7193	15.8563	15.7971

Framework versions

Transformers 4.34.0
Pytorch 2.1.0+cu121
Datasets 2.14.5
Tokenizers 0.14.1