Francesco-A/bert-finetuned-squad-v1 - AI Model Zoo

bert-finetuned-squad-v1

This model is a fine-tuned version of bert-base-cased on the Stanford Question Answering Dataset (SQuAD).

Model description:

The bert-finetuned-squad-v1 model is built upon the BERT (Bidirectional Encoder Representations from Transformers) architecture and has been fine-tuned specifically for the task of question-answering on the SQuAD dataset. It takes a passage of text (context) and a question as input and predicts the start and end positions of the answer within the context.

Intended uses & limitations:

Intended Uses:

This model is designed for question-answering tasks where a given context and question need to be answered with a span of text from the context.
It can be used in applications such as chatbots, search engines, and any scenario where questions are answered based on a given passage.

Limitations:

The model's performance may vary depending on the complexity and length of the questions and contexts.
It may not perform well on questions requiring common-sense reasoning or world knowledge beyond the training data.
The model's output is limited to a single span within the context, which may not cover multi-sentence or complex answers.
It is not suitable for tasks that involve generating lengthy or abstractive answers.

Training and evaluation data:

The model was trained on the SQuAD dataset, which consists of two main splits:

Training Set: It comprises 87,599 examples, each consisting of a context, a question, and the corresponding answer span(s).
Validation Set: It consists of 10,570 examples, similar in structure to the training set, used for model evaluation during training.

Training procedure:

The training process involved several key steps:

Preprocessing: The training data was preprocessed to convert text inputs into numerical IDs using a BERT tokenizer. Additionally, labels for start and end positions of answer spans were generated.
Sliding Window: To handle long contexts, a sliding window approach was employed. Long contexts were split into multiple input features with overlapping tokens.
Fine-tuning: The model was fine-tuned on the SQuAD training data, with a focus on minimizing the loss associated with predicting answer spans.
Post-processing: During inference, the model predicts start and end logits for answer spans, and these logits are used to determine the answer span with the highest score. The predictions are then converted back into text spans based on token offsets.
Evaluation: The model's performance was evaluated on the SQuAD validation set using metrics such as exact match (EM) and F1 score, which measure the accuracy of the predicted answers.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Validation results

Number of Validation Examples: 10,570
Exact Match (EM): 80.86%
F1 Score: 88.28%

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3