Subtitle Translation Model
This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from ted_talks_iwslt, finetuning the Helsinki-NLP/opus-mt-en-mul model.
Intended Use
This model has been trained with the intention of building a tool for subtitle translation.
Data
The dataset has been split into the following structure:
DatasetDict({
    train: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 2454
    })
    validation: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
    test: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
})
Note: Evaluation numbers have been obtained using 50 samples from test set.
Relevant Training Arguments
    evaluation_strategy = "epoch"
    learning_rate=2e-5
    per_device_train_batch_size=4
    per_device_eval_batch_size=4
    weight_decay=0.01
    save_total_limit=3
    num_train_epochs=1
    predict_with_generate=True
    fp16=False
Evaluation Results
The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set.
- Eval metrics
{'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93}
- Test set evaluation (50 transcriptions)
{'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83}
Using the model
This model can be easily used with the following lines of code:
from transformers import pipeline
pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator")
pipe("Hi everyone!")
>>[{'translation_text': 'Hola a todos!'}]
 
       
      