xlnet-large-bahasa-cased
Pretrained XLNET large language model for Malay.
Pretraining Corpus
xlnet-large-bahasa-cased
model was pretrained on ~1.4 Billion words. Below is list of data we trained on,
Pretraining details
- All steps can reproduce from here, Malaya/pretrained-model/xlnet.
Load Pretrained Model
You can use this model by installing torch
or tensorflow
and Huggingface library transformers
. And you can use it directly by initializing it like this:
from transformers import XLNetModel, XLNetTokenizer
model = XLNetModel.from_pretrained('malay-huggingface/xlnet-large-bahasa-cased')
tokenizer = XLNetTokenizer.from_pretrained(
'malay-huggingface/xlnet-large-bahasa-cased',
do_lower_case = False,
)