
language: si




This is an mt5-based Question Answering model for the Sinhalese language. Training is done on translated SQuAD dataset of 8k questions. The translation was done by google translate API.

The training was done on Google Colab TPU environment with parallel training techniques. The training was done on around 9k data points which consists of context, question, answer trios for the Sinhala language. Evaluation is done using standard SQuAD evaluation script on around 1k data points which gave following results on the best parameter setting. Evaluation matrices used are EM matric and F1 score matric.

Evaluation - {'EM': 39.413680781758956, 'f1': 66.16331104953571}