generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

ke_t5_base_bongsoo_en_ko

This model is a fine-tuned version of KETI-AIR/ke-t5-base on the bongsoo/news_news_talk_en_ko dataset. See translation_ke_t5_base_bongsoo_en_ko.ipynb

Model description

KE-T5 is a pretrained-model of t5 text-to-text transfer transformers using the Korean and English corpus developed by KETI (한국전자연구원). The vocabulary used by KE-T5 consists of 64,000 sub-word tokens and was created using Google's sentencepiece. The Sentencepiece model was trained to cover 99.95% of a 30GB corpus with an approximate 7:3 mix of Korean and English.

Intended uses & limitations

Translation from English to Korean

Usage

You can use this model directly with a pipeline for translation language modeling:

>>> from transformers import pipeline
>>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')

>>> translator("Let us go for a walk after lunch.")
[{'translation_text': '점심을 마치고 산책을 하러 가자.'}]

>>> translator("The BRICS countries welcomed six new members from three different continents on Thursday.")
[{'translation_text': '브릭스 국가들은 지난 24일 3개 대륙 6명의 신규 회원을 환영했다.'}]

>>> translator("The BRICS countries welcomed six new members from three different continents on Thursday, marking a historic milestone that underscored the solidarity of BRICS and developing countries and determination to work together for a better future, officials and experts said.",max_length=400)
[{'translation_text': '브렙스 국가는 지난 7일 3개 대륙 6명의 신규 회원을 환영하며 BRICS와 개발도상국의 연대와 더 나은 미래를 위해 함께 노력하겠다는 의지를 재확인한 역사적인 이정표를 장식했다고 관계자들과 전문가들은 전했다.'}]

>>> translator("Biden’s decree zaps lucrative investments in China’s chip and AI sectors")
[{'translation_text': '바이든 장관의 행정명령은 중국 칩과 AI 분야의 고수익 투자를 옥죄는 것이다.'}]

>>> translator("It is most likely that China’s largest chip foundry, a key piece of the puzzle in Beijing’s efforts to achieve greater self-sufficiency in semiconductors, would not have been able to set up its first plant in Shanghai’s suburbs in the early 2000s without funding from American investors such as Walden International and Goldman Sachs.", max_length=400)
[{'translation_text': '반도체의 더 큰 자립성을 이루기 위해 베이징이 애쓰는 퍼즐의 핵심 조각인 중국 최대 칩 파운드리가 월덴인터내셔널, 골드만삭스 등 미국 투자자로부터 자금 지원을 받지 못한 채 2000년대 초 상하이 시내에 첫 공장을 지을 수 없었을 가능성이 크다.'}]

## Training and evaluation data

One third of the original training data size of 1200000 is selected because of the resource limit of the colab of google.

## Training procedure

Because of the limitation of google's colab, the model is trained only by one epoch. The result is still quite satisfactory. The quality of translation is not so bad.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Bleu   |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| No log        | 1.0   | 5625 | 2.4075          | 8.2272 |

- cpu usage: 4.8/12.7GB
- gpu usage: 13.0/15.0GB
- running time: 3h

### Framework versions

- Transformers 4.32.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.4
- Tokenizers 0.13.3