xtremedistil-l6-h256-uncased fine-tuned on SQuAD
This model was developed as part of a project for the Deep Learning for NLP (DL4NLP) lecture at Technische Universität Darmstadt (2022). It uses xtremedistil-l6-h256-uncased as a base model and was fine-tuned on the SQuAD dataset for Question Answering. It makes no distinction between uppercase and lowercase words.
Dataset
As mentioned previously, the SQuAD dataset used to train and evaluate the model. It was downloaded from GitHub and is divided into the following splits.
Split | Number of examples |
---|---|
Training | 86 588 |
Evaluation | 10 507 |
The following script was used to download, prepare and load the dataset so that it could be appropriately used by the model. Although it was not directly downloaded from Hugging Face, the dataset was formatted the exactly same way as the version available on Hugging Face.
dataset_directory = 'dataset'
train_file = 'train.json'
dev_file = 'dev.json'
if not os.path.exists(dataset_directory):
print('Creating dataset directory\n')
os.makedirs(dataset_directory)
# download train and dev splits from the dataset
!wget https://s3.us-east-2.amazonaws.com/mrqa/release/v2/train/SQuAD.jsonl.gz -O dataset/train.jsonl.gz
!wget https://s3.us-east-2.amazonaws.com/mrqa/release/v2/dev/SQuAD.jsonl.gz -O dataset/dev.jsonl.gz
# unpack the files
!gzip -d dataset/train.jsonl.gz
!gzip -d dataset/dev.jsonl.gz
def prepare_data(dir, file_name):
data = []
with open(f'{dir}/{file_name}l', 'r') as f:
# skip header
next(f)
for line in f:
entry = json.loads(line)
for qas in entry['qas']:
answer_start = []
for answer in qas['detected_answers']:
answer_start.append(answer['char_spans'][0][0])
data.append({
'id': qas['id'],
'context': entry['context'],
'question': qas['question'],
'answers': {
'text': qas['answers'],
'answer_start': answer_start
}
})
with open(f'{dir}/{file_name}', 'w') as f:
for entry in data:
json.dump(entry, f)
f.write('\n')
os.remove(f'{dir}/{file_name}l')
prepare_data(dataset_directory, train_file)
prepare_data(dataset_directory, dev_file)
data_files = {'train': train_file, 'validation': dev_file}
dataset = load_dataset(dataset_directory, data_files=data_files)
Hyperparameters
The hyperparameters utilized to fine-tune the model are listed below.
- epochs: 2
- train_batch_size: 16
- eval_batch_size: 32
- optimizer: adamW
- lr: 5e-5
- weight_decay: 0.01
- lr_scheduler: linear
- num_warmup_steps: 0
- max_length: 512
Fine-Tuning and Evaluation
Most of the code used to pre-process the dataset, define a training loop and post-process the predictions generated by the model was adapated from the Question Answering course from Hugging Face.
The model was fine-tuned using GPU acceleration on Google Colab. The entire training and evaluation process took approximately 1h10min. More specifically, for each epoch, the training step was completed in 17-18 minutes, while the evaluation lasted for about 16-18 minutes.
After fine-tuning, the following results were achieved on the evaluation set (using the squad metric):
Metric | Value |
---|---|
Exact Match (EM) | 61.91110688112687 |
F1-Score | 77.2232806051733 |