mt5-small-finetuned-amazon-en-es

This model is a fine-tuned version of google/mt5-small on the amazon_reviews_multi dataset. It achieves the following results on the evaluation set:

Loss: 3.0205
Rouge1: 16.4636
Rouge2: 8.2233
Rougel: 16.3489
Rougelsum: 16.3382

Model Description

This model is a fine-tuned version of the mT5-small model, a multilingual Transformer-based model pretrained for various NLP tasks. It has been further refined using a dataset consisting of book reviews and review titles, making it particularly well-suited for tasks similar to summarizing book reviews.

Intended Uses & Limitations

Intended Uses:

Text Summarization: The model is intended for generating concise and coherent summaries from longer texts in English and Spanish, with a specific focus on book reviews.

Limitations:

Length Constraints: The model may produce summaries that are limited in length, which might not capture all details of the source text.
Quality Variance: The quality of generated summaries may vary depending on the complexity of the source text and the quality of training data.
Bilingual Considerations: While the model supports both english and spanish, the quality of summaries can differ between languages, i.e., one language may have less robust performance.

Training and Evaluation Data

The training and evaluation data used for fine-tuning this model encompassed a diverse collection of textual reviews and their corresponding titles. This dataset serves as the foundation for the model's text summarization capabilities. Below are the key aspects of the dataset preparation:

Size and Splits: _______.
Main Domain Selection: The focus was placed on summarizing book reviews—consistent with Amazon's foundational roots in the book industry. Within the dataset, two primary product categories were identified as relevant for this purpose: "book" and "digital_ebook_purchase." Therefore, the datasets in both English and Spanish were meticulously filtered to retain only examples related to these product categories, ensuring the model's specialization.
Filtering for Quality: Ensuring that the model generates meaningful summaries is of paramount importance. To achieve this, a filtering strategy was employed. Specifically, examples with exceedingly short titles were filtered out, enhancing the model's capacity to produce more informative and contextually relevant summaries. This filtering process involved a heuristic approach, wherein titles were split based on whitespace, and the Dataset.filter() method was applied to retain examples meeting the defined criteria.

Baseline: Lead-3 Summarization

A common baseline for text summarization tasks is the "Lead-3" baseline, which simply extracts the first three sentences from the source text as the summary. This baseline helps provide a reference point for evaluating the model's performance. On the validation set, the Lead-3 baseline achieved the following ROUGE scores:

ROUGE-1: 16.75
ROUGE-2: 8.81
ROUGE-L: 15.61
ROUGE-Lsum: 15.96

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5.6e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
3.4071	1.0	1209	3.1603	17.3175	8.3009	16.7074	16.755
3.0542	2.0	2418	3.1411	18.3538	9.0086	17.8745	17.8275
3.3216	3.0	3627	3.0424	15.7882	7.908	15.5215	15.5397
3.2157	4.0	4836	3.0497	15.6788	7.7739	15.3788	15.4032
3.1488	5.0	6045	3.0347	15.8221	7.8918	15.6714	15.6797
3.0838	6.0	7254	3.0254	16.2869	8.2442	16.1594	16.1527
3.0639	7.0	8463	3.0197	17.1527	8.4248	16.9826	16.9533
3.0388	8.0	9672	3.0205	16.4636	8.2233	16.3489	16.3382

Framework versions

Transformers 4.32.0
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3