distilgpt2-nepali-patrakar-qa

This model is a fine-tuned version of Sakonii/distilgpt2-nepali on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.9077

Model description

Refer to original distilgpt2

Intended uses & limitations

This marginally fine-tuned model can be used for Nepali text generation and possibly question answering and intends to be fine-tuned on Nepali language focused generative downstream task. The language model being trained on a data with texts grouped to a block size of 512, it handles text sequence up to 512 tokens.

Training procedure

The model is trained with the same configuration as the original distilgpt2; but with 512 tokens per instance, 72 instances per batch, and around 34.14K training steps (excluding the pre-training with CLM Objective).

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 72
eval_batch_size: 72
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss
4.1278	1.0	6829	4.0184
3.9461	2.0	13658	3.9630
3.8268	3.0	20487	3.9319
3.7978	4.0	27316	3.9140
3.7949	5.0	34145	3.9077

Framework versions

Transformers 4.32.1
Pytorch 2.0.0
Datasets 2.1.0
Tokenizers 0.13.3