generated_from_keras_callback

<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->

distilBERT-Nepali

This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2

It achieves the following results on the evaluation set:

Perplexity:

  • lowest: 17.31
  • average: 19.12z

(This is because training is done in batches of data due to limited resources available)

Loss:

  • loss: 3.2503
  • val_loss: 3.0674

Model description

This model is trained on raygx/Nepali-Extended-Text-Corpus dataset. This dataset is a mixture of cc100 and raygx/Nepali-Text-Corpus. Thus this model is trained on 10 times more data than its previous self. Another change is, the tokenizer is different. Hence, it is a totally different model.

Training procedure

Training is done by running one epoch at once on a batch of data. Thus, training is done for total 6 rounds. So, there were total of 3 batches and 2 epochs.

Training hyperparameters

The following hyperparameters were used during training:

Training results

Perplexity:

Loss:

Framework versions