BART-base Question Generation

This model is a fine-tuned version of facebook/bart-base on different questions and answering dataset. It was trained to generation question using two different approaches, <b> Casual-Generation </b> and <b> Context-based-Generation </b>.

Model description

The model takes context as an input sequence, and will generate a full question sentence as an output sequence. There are two ways the model can be queried produce the questions:

<b> Casual-Generation </b>: where the model is tasked to generate questions answerable by a given passage. The input should be follow the structure or format: '<generate_questions> paragraph: put your passage text here'. <br/> Example: <br/> <generate_questions> paragraph: The lithosphere is broken into tectonic plates whose motion allows heat to escape from the interior of the Earth into space. The crust lies on top of the mantle, a configuration that is stable because the upper mantle is made of peridotite and is therefore significantly denser than the crust. The boundary between the crust and mantle is conventionally placed at the Mohorovičić discontinuity, a boundary defined by a contrast in seismic velocity.
<b> Context-based-Generation </b>: given a section of a passage (context), the model is tasked to generate questions from the passage about the selected section or context. The input should be follow the structure or format: <generate_context_questions> <section> put your context here </section> paragraph: put your passage text here'. <br/> Example: <br/> <generate_context_questions> <section> Mohorovičić discontinuity </section> paragraph: The lithosphere is broken into tectonic plates whose motion allows heat to escape from the interior of the Earth into space. The crust lies on top of the mantle, a configuration that is stable because the upper mantle is made of peridotite and is therefore significantly denser than the crust. The boundary between the crust and mantle is conventionally placed at the Mohorovičić discontinuity, a boundary defined by a contrast in seismic velocity.

The input sequence can then be encoded and passed as the input_ids argument in the model's generate() method.

limitations

The model was trained on only a limited amount of data hence questions might be poor quality. In addition the questions generated have style similar to that of the training data.

Training and evaluation data

The dataset used to train the model comprises the training datasets from:

Reasoning Over Paragraph Effects in Situations (ROPES): https://allenai.org/data/ropes
SQUAD:
DROP (Discrete Reasoning Over Paragraphs): https://allenai.org/data/drop
SciQ

After preprocessing the data from the above listed datasets, we had 408372 examples for training the model and 25k for development and 18k for testing.

Training procedure

The model is trained (finetuned) for 5 epochs with the hyperparameters listed below:

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.25
num_epochs: 5

At the end of 5 epochs, the Evaluation loss was: 1.64 and the training loss was: 0.9671.

Framework versions

Transformers 4.23.1
Pytorch 1.13.0
Datasets 2.6.1
Tokenizers 0.13.1