bert-base-finnish-cased-v1 for QA
This is the bert-base-finnish-cased-v1 model, fine-tuned using an automatically translated Finnish version of the SQuAD2.0 dataset in combination with the Finnish partition of the TyDi-QA dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of question answering.
When the model classifies the question as unanswerable, it outputs "[CLS]". There is also a QA model available that does not try to identify unanswerable questions, bert-base-finnish-cased-squad1-fi .
Overview
Language model: bert-base-finnish-cased-v1
Language: Finnish
Downstream-task: Extractive QA
Training data: Finnish SQuAD 2.0 + Finnish partition of TyDi-QA
Eval data: Finnish SQuAD 2.0 + Finnish partition of TyDi-QA
Usage
In Transformers
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model_name = "ilmariky/bert-base-finnish-cased-squad2-fi"
# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
'question': 'Mikä tämä on?',
'context': 'Tämä on testi.'
}
res = nlp(QA_input)
# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Performance
Evaluated with a slightly modified version of the official eval script.
{
"exact": 55.53157042633567,
"f1": 61.869335312255835,
"total": 7412,
"HasAns_exact": 51.26503525508088,
"HasAns_f1": 61.006950090095565,
"HasAns_total": 4822,
"NoAns_exact": 63.47490347490348,
"NoAns_f1": 63.47490347490348,
"NoAns_total": 2590
}