pytorch t5 question-generation seq2seq

T5 Question Generation and Question Answering

Model description

This model is a T5 Transformers model (airklizz/t5-base-multi-fr-wiki-news) that was fine-tuned in french on 3 different tasks

It obtains quite good results on FQuAD validation dataset.

Intended uses & limitations

This model functions for the 3 tasks mentionned earlier and was not tested on other tasks.

from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("JDBN/t5-base-fr-qg-fquad")
tokenizer = T5Tokenizer.from_pretrained("JDBN/t5-base-fr-qg-fquad")

Training data

The initial model used was https://huggingface.co/airKlizz/t5-base-multi-fr-wiki-news. This model was finetuned on a dataset composed of FQuAD and PIAF on the 3 tasks mentioned previously.

The data were preprocessed like this

The preprocessing we used was implemented in https://github.com/patil-suraj/question_generation

Eval results

On FQuAD validation set

BLEU_1 BLEU_2 BLEU_3 BLEU_4 METEOR ROUGE_L CIDEr
0.290 0.203 0.149 0.111 0.197 0.284 1.038

Question Answering metrics

For these metrics, the performance of this question answering model (https://huggingface.co/illuin/camembert-base-fquad) on FQuAD original question and on T5 generated questions are compared.

Questions Exact Match F1 Score
Original FQuAD 54.015 77.466
Generated 45.765 67.306

BibTeX entry and citation info

@misc{githubPatil,
author = {Patil Suraj},
title = {question generation GitHub repository},
year = {2020},
howpublished={\url{https://github.com/patil-suraj/question_generation}}
}

@article{T5,
    title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
    author={Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
    year={2019},
    eprint={1910.10683},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@misc{dhoffschmidt2020fquad,
      title={FQuAD: French Question Answering Dataset}, 
      author={Martin d'Hoffschmidt and Wacim Belblidia and Tom Brendlé and Quentin Heinrich and Maxime Vidal},
      year={2020},
      eprint={2002.06071},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}