bert roberta xlmroberta vietnam vietnamese wiki

Vi-XLM-RoBERTa base model (uncased)

Epochs 0/40. Running Loss: 6.4104: 1%| | 25149/2442585 [7:18:31<639:12:16]

MODEL IS NOT BEING TRAINED (training on hold for a while)

<a href=""> <img width="1024px" srcng"> </a>

Logging:

This model is Vietnamese XLM Robert, base, uncased.

Model description

Intended uses & limitations

You can use the raw model for either masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.

How to use

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, XLMRobertaForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('anhdungitvn/vi-xlm-roberta-large')
model = XLMRobertaForMaskedLM.from_pretrained('anhdungitvn/vi-xlm-roberta-large')
text = "Câu bằng tiếng Việt."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:

from transformers import AutoTokenizer, XLMRobertaForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('anhdungitvn/vi-xlm-roberta-large')
model = XLMRobertaForMaskedLM.from_pretrained('anhdungitvn/vi-xlm-roberta-large')
text = "Câu bằng tiếng Việt."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

Limitations and bias

Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.

Training data

Vietnamese Wiki (2022, 1GB) + Vietnamese News (Add Reference Here)

Training procedure

Model was pretrained:

Evaluation results

Pretraining metrics and results:

When fine-tuned on downstream tasks, this model achieves the following results:

Task SC CN NC VSLP_2016_ASC T T T T
x x x x x x x x

Downstream Task Dataset:

Evaluation results

SC: Sentiment Classification (Phân loại sắc thái bình luận)

<a href=""> </a>

Metrics:

BibTeX entry and citation info

@article{2022,
  title={x},
  author={x},
  journal={ArXiv},
  year={2022},
  volume={x}
}

<a href="https://huggingface.co/exbert/?model=anhdungitvn/vi-xlm-roberta-base"> <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png"> </a>