generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

Important Note:

I created the combined metric (55% F1 score + 45% exact match score) and load the state with the best result at the end. Here is the setting in the TrainingArguments:

  load_best_model_at_end=True,
  metric_for_best_model='combined',
  greater_is_better=True,

DSPFirst-Finetuning-5

This model is a fine-tuned version of ahotrod/electra_large_discriminator_squad2_512 on a generated Questions and Answers dataset from the DSPFirst textbook based on the SQuAD 2.0 format.<br /> It achieves the following results on the evaluation set:

More accurate metrics:

Before fine-tuning:

 'HasAns_exact': 54.71817606079797,
 'HasAns_f1': 61.08672724332754,
 'HasAns_total': 1579,
 'NoAns_exact': 88.78048780487805,
 'NoAns_f1': 88.78048780487805,
 'NoAns_total': 205,
 'best_exact': 58.63228699551569,
 'best_exact_thresh': 0.0,
 'best_f1': 64.26902596256402,
 'best_f1_thresh': 0.0,
 'exact': 58.63228699551569,
 'f1': 64.26902596256404,
 'total': 1784

After fine-tuning:

 'HasAns_exact': 67.57441418619379,
 'HasAns_f1': 75.92137683558988,
 'HasAns_total': 1579,
 'NoAns_exact': 63.41463414634146,
 'NoAns_f1': 63.41463414634146,
 'NoAns_total': 205,
 'best_exact': 67.0964125560538,
 'best_exact_thresh': 0.0,
 'best_f1': 74.48422310728503,
 'best_f1_thresh': 0.0,
 'exact': 67.0964125560538,
 'f1': 74.48422310728503,
 'total': 1784

Dataset

A visualization of the dataset can be found here.<br /> The split between train and test is 70% and 30% respectively.

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 4160
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 1784
    })
})

Intended uses & limitations

This model is fine-tuned to answer questions from the DSPFirst textbook. I'm not really sure what I am doing so you should review before using it.<br /> Also, you should improve the Dataset either by using a better generated questions and answers model (currently using https://github.com/patil-suraj/question_generation) or perform data augmentation to increase dataset size.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Model hyperparameters

Training results

Training Loss Epoch Step Validation Loss Exact F1 Combined
2.3222 0.81 20 1.0363 60.3139 68.8586 65.0135
1.6149 1.65 40 0.9702 64.7422 72.5555 69.0395
1.2375 2.49 60 1.0007 64.6861 72.6306 69.0556
1.0417 3.32 80 0.9963 66.0874 73.8634 70.3642
0.9401 4.16 100 0.8803 67.0964 74.4842 71.1597
0.8799 4.97 120 0.8652 66.7040 74.1267 70.7865
0.8712 5.81 140 0.8921 66.3677 73.7213 70.4122
0.8311 6.65 160 0.8529 66.3117 73.4039 70.2124

Framework versions