Finetuend bert-base-multilignual-cased model on Thai sequence and token classification datasets

<br>

Finetuned XLM Roberta BASE model on Thai sequence and token classification datasets The script and documentation can be found at this repository.

<br>

Model description

<br>

We use the pretrained cross-lingual BERT model (mBERT) as proposed by [Devlin et al., 2018]. We download the pretrained PyTorch model via HuggingFace's Model Hub (https://huggingface.co/bert-base-multilignual-cased) <br>

Intended uses & limitations

<br>

You can use the finetuned models for multiclass/multilabel text classification and token classification task.

<br>

Multiclass text classification

Multilabel text classification

Token classification

<br>

How to use

<br>

The example notebook demonstrating how to use finetuned model for inference can be found at this Colab notebook

<br>

BibTeX entry and citation info

@misc{lowphansirikul2021wangchanberta,
      title={WangchanBERTa: Pretraining transformer-based Thai Language Models}, 
      author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
      year={2021},
      eprint={2101.09635},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}