Pre-trained ELECTRA small for Tigrinya Language

We pre-train ELECTRA small on the TLMD dataset, with over 40 million tokens.

Contained are trained Flax and PyTorch models.


The hyperparameters corresponding to model sizes mentioned above are as follows:

Model Size L AH HS FFN P Seq
SMALL 12 4 256 1024 14M 512

(L = number of layers; AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters; Seq = maximum sequence length.)

Framework versions


If you use this model in your product or research, please cite as follows:

  author={Fitsum Gaim and Wonsuk Yang and Jong C. Park},
  title={Monolingual Pre-trained Language Models for Tigrinya},
  publisher={WiNLP 2021 at EMNLP 2021}