TakoMT
This is a translation model using Marian-NMT. For more details, please see my repository.
In addition to the data listed in the repository I also used ParaCrawl.
- source languages: de, en, es, fr, it, ru, uk
- target language: ja
How to use
This model uses transformers and sentencepiece.
!pip install transformers sentencepiece
You can use this model directly with a pipeline:
from transformers import pipeline
tako_translator = pipeline('translation', model='staka/takomt')
tako_translator('This is a cat.')
Eval results
The results of the evaluation using tatoeba(randomly selected 500 sentences) are as follows:
source | target | BLEU(*1) |
---|---|---|
de | ja | 27.8 |
en | ja | 28.4 |
es | ja | 32.0 |
fr | ja | 27.9 |
it | ja | 24.3 |
ru | ja | 27.3 |
uk | ja | 29.8 |
(*1) sacrebleu --tokenize ja-mecab