Subtitle Translation Model
This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from ted_talks_iwslt, finetuning the Helsinki-NLP/opus-mt-en-mul model.
Intended Use
This model has been trained with the intention of building a tool for subtitle translation.
Data
The dataset has been split into the following structure:
DatasetDict({
train: Dataset({
features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
num_rows: 2454
})
validation: Dataset({
features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
num_rows: 307
})
test: Dataset({
features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
num_rows: 307
})
})
Note: Evaluation numbers have been obtained using 50 samples from test set.
Relevant Training Arguments
evaluation_strategy = "epoch"
learning_rate=2e-5
per_device_train_batch_size=4
per_device_eval_batch_size=4
weight_decay=0.01
save_total_limit=3
num_train_epochs=1
predict_with_generate=True
fp16=False
Evaluation Results
The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set.
- Eval metrics
{'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93}
- Test set evaluation (50 transcriptions)
{'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83}
Using the model
This model can be easily used with the following lines of code:
from transformers import pipeline
pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator")
pipe("Hi everyone!")
>>[{'translation_text': 'Hola a todos!'}]