translation

Subtitle Translation Model

This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from ted_talks_iwslt, finetuning the Helsinki-NLP/opus-mt-en-mul model.

Intended Use

This model has been trained with the intention of building a tool for subtitle translation.

Data

The dataset has been split into the following structure:

DatasetDict({
    train: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 2454
    })
    validation: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
    test: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
})

Note: Evaluation numbers have been obtained using 50 samples from test set.

Relevant Training Arguments

    evaluation_strategy = "epoch"
    learning_rate=2e-5
    per_device_train_batch_size=4
    per_device_eval_batch_size=4
    weight_decay=0.01
    save_total_limit=3
    num_train_epochs=1
    predict_with_generate=True
    fp16=False

Evaluation Results

The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set.

{'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93}
{'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83}

Using the model

This model can be easily used with the following lines of code:

from transformers import pipeline
pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator")
pipe("Hi everyone!")

>>[{'translation_text': 'Hola a todos!'}]