seq2seq abstractive question answering

This model is a fine-tuned version of MBART-large, a multilingual text-to-text encoder-decoder transformer. It is trained on lfqa-spanish, an automatically translated dataset, originally created in English in this repository. For more details about the dataset, check its model card.

For optimizing the model, we used Adafactor optimizer, which is better suited for t5-class models than AdamW (the typically used one). We used linear decay, and the full hyperparameters for this model were:

{
  "learning_rate": 2e-4,
  "num_train_epochs": 4,
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "adam_epsilon": 1e-8,
  "total_train_batch_size": 64,
  "warmup_ratio": 0.06,
}

This model is therefore trained to provide long-form answers to open domain questions given certain context paragraphs which can be used to answer that question. Therefore the main task this model can perform is abstractive question answering.

The result it obtains on the validation set of this dataset (it doesn't have a test set), with num_beams = 8 and maximum target sequence length = 360 are:

{"rouge1": 0.5107, "rouge2": 0.0042, "rougeL": 0.5108, "rougeLsum": 0.5106, "gen_len": 201.7371}

Contributions

Thanks to @avacaondata, @alborotis, @albarji, @Dabs, @GuillemGSubies for adding this model.