Taglish-Electra

Our Taglish-Electra model was pretrained with two Filipino training datasets and one English dataset to increase improvement against Filipino text with English where speakers may code-switch between the two languages.

  1. Openwebtext (English)
  2. WikiText-TL-39 (Filipino)
  3. TLUnified Large Scale Corpus

This is the discriminator model, which is the main Transformer used for finetuning to downstream tasks. For generation, mask-filling, and retraining, refer to the Generator models.