Important Note:

load_best_model_at_end is not working properly (I specified metric_for_best_model on another training but it still does not work), but the training results still show a valid trend.

DSPFirst-Finetuning-4

This model is a fine-tuned version of ahotrod/electra_large_discriminator_squad2_512 on a generated Questions and Answers dataset from the DSPFirst textbook based on the SQuAD 2.0 format.<br /> It achieves the following results on the evaluation set:

Loss: 0.9028
Exact: 66.9843
F1: 74.2286

More accurate metrics:

Before fine-tuning:

  "exact": 57.006726457399104,
  "f1": 61.997705120754276

After fine-tuning:

  "exact": 66.98430493273543,
  "f1": 74.2285867775556

Dataset

A visualization of the dataset can be found here.<br /> The split between train and test is 70% and 30% respectively.

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 4160
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 1784
    })
})

Intended uses & limitations

This model is fine-tuned to answer questions from the DSPFirst textbook. I'm not really sure what I am doing so you should review before using it.<br /> Also, you should improve the Dataset either by using a better generated questions and answers model (currently using https://github.com/patil-suraj/question_generation) or perform data augmentation to increase dataset size.

Training and evaluation data

batch_size of 6 results in 14.82 GB VRAM
Utilizes gradient_accumulation_steps to get total batch size to 514 (batch size should be at least 256)
4.52 GB RAM
30% of the total questions is dedicated for evaluating.

Training procedure

The model was trained from Google Colab
Utilizes Tesla P100 16GB, took 6.3 hours to train
load_best_model_at_end is enabled in TrainingArguments

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 6
eval_batch_size: 6
seed: 42
gradient_accumulation_steps: 86
total_train_batch_size: 516
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Model hyperparameters

hidden_dropout_prob: 0.36
attention_probs_dropout_prob = 0.36

Training results

Training Loss	Epoch	Step	Validation Loss	Exact	F1
2.4411	0.81	20	1.4556	62.0516	71.1082
2.2027	1.64	40	1.1508	65.0224	73.8669
1.2827	2.48	60	1.0030	65.8632	74.3959
1.0925	3.32	80	1.0155	66.8722	75.2204
1.03	4.16	100	0.8863	66.1996	73.8166
0.9085	4.97	120	0.9675	67.9372	75.7764
0.8968	5.81	140	0.8635	67.2085	74.3725
0.8867	6.64	160	0.9035	65.9753	73.4569
0.8456	7.48	180	0.9098	67.2085	74.6798
0.8506	8.32	200	0.8807	66.6480	74.2903
0.7972	9.16	220	0.8711	66.6480	73.5801
0.7795	9.97	240	0.9028	66.9843	74.2286

Framework versions

Transformers 4.18.0
Pytorch 1.10.0+cu111
Datasets 2.1.0
Tokenizers 0.12.1