mt5_small_bongsoo_en_ko

This model is a fine-tuned version of chunwoolee0/mt5_small_bongsoo_en_ko on the bongsoo/news_talk_en_ko dataset. It achieves the following results on the evaluation set:

Loss: 2.7805
Rouge1: 0.1932
Rouge2: 0.0394
Rougel: 0.1895
Sacrebleu: 0.4518

Model description

mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages

Intended uses & limitations

Translation from English to Korean

Usage

You can use this model directly with a pipeline for translation language modeling:

>>> from transformers import pipeline
>>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')

>>> translator("Let us go for a walk after lunch.")
[{'translation_text': '식당에 앉아서 밤에 갔다.'}]

>>> translator("Skinner's reward is mostly eye-watering.")
[{'translation_text': '벤더의 선물은 너무 마음이 쏠린다.'}]

Training and evaluation data

The value of max_length is critical to the training. The usual value of 128 used for Indo-European languages causes a greate trouble in gpu usage. Therefore it should be reduced to 64 in order to succeed. Another problem comes from the usual split of data into 80% for train and 20% for validation. By this, the evaluation step takes too much time. Here 99% and 1% split is used without change in the evaluation.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Sacrebleu
3.8338	0.16	500	2.9626	0.1475	0.0184	0.1455	0.4243
3.7865	0.32	1000	2.9305	0.1529	0.0181	0.1508	0.4435
3.7436	0.48	1500	2.9067	0.1572	0.019	0.155	0.4464
3.7207	0.65	2000	2.8924	0.165	0.0233	0.1629	0.4532
3.7022	0.81	2500	2.8825	0.1647	0.0231	0.1627	0.4504
3.69	0.97	3000	2.8778	0.1662	0.0237	0.1647	0.4694

The mT5 model of google cannot be used for Korean although it is trained over 101 languages. Finetuning using very large data set such as bongsoo/news_talk_en_ko still yield garbage. Since GPU memories allowed for free use in colab are greatly limited, repeated fine-tunings for the split datasets are performed to obtain better results. Theoretically, this might give better results. But actual attempts fail to yield better results. Instead, the results become worse. One should use other models like the ke-t5 by KETI(한국전자연구원).

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3