Vi-XLM-RoBERTa base model (uncased)
MODEL IS BEING TRAINED (Pending...)
- Progress: ▓▓░░░░░░░░ 25.00%
- Epochs: 10/40 (2022-12-06)
Logs
<a href="https://wandb.ai/anhdungitvn/test/reports/perplexity-22-10-21-21-03-10---VmlldzoyODMwOTQ3?accessToken=lk98cqcsbmcdl4ftjw8vu2bfoc6ifigal1p2db3zo27cc7xwzedfivk3o4aeei3h">Perplexity</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/Training-loss-22-10-21-21-04-12---VmlldzoyODMwOTU4?accessToken=8kfl35d92xxa1eockvrf08p8r65zhlrm1dxfl7vhu5avc3cfozl897n030t1q1f6">Train Loss</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/eval_loss-22-10-21-21-04-24---VmlldzoyODMwOTYx?accessToken=3efesxgcurawkzl93anh4w4hz28c1fl5oexrnwb0mo0pz118fz6mappkpr434tp9">Eval Loss</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/lr-22-10-21-21-03-57---VmlldzoyODMwOTU1?accessToken=ggggf7fe4c4rvhexuonybf3m6mhrdn8tz7o4qjqwbml6k0h9f0953zpsbrympktq">Learning Rate</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/global_step-22-10-21-21-04-36---VmlldzoyODMwOTYy?accessToken=cjd65vgz6uviswqqi5tbud14g9v4bxhjpuccal4hqmrm1h4vqfxithrf4wg0u7xq">Global Step</a>
Gradients by layers
<a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-embeddings-LayerNorm-weight-22-12-06-20-31-20---VmlldzozMDk0NTkx?accessToken=vepursxxx3s66rvrnwtiep9jhx8vjkoid1oyk3gd83b6ertq6r70qhotjg9ne36s">Embedding</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-0-attention-output-LayerNorm-weight-22-12-06-20-30-40---VmlldzozMDk0NTg4?accessToken=x975ir3hd7smhqm64dahbgtv76uwklmgf87j7izkkjbcgu41ed9zcfkk3rzb1nmm">L0-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-1-attention-output-LayerNorm-weight-22-12-06-20-30-24---VmlldzozMDk0NTg2?accessToken=gwqxn1asm08k2jrdhnpa2oiprley2cmpqgnazjxctorq53yfb0zq5yqq8p46ubzq">L1-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-2-attention-output-LayerNorm-weight-22-12-06-20-30-09---VmlldzozMDk0NTgz?accessToken=b6kr19k4qutc12i1hsfki4jryr8ou5pzbmlkvyq6zwxj25tw1txe5vsd8t4li4i3">L2-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-3-attention-output-LayerNorm-weight-22-12-06-20-29-49---VmlldzozMDk0NTgx?accessToken=12klldmz2xpfjcw0bi6a5ggqe3ebtcdq9yismzaz562xajxb696zqcgvsdxp3ext">L3-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-4-attention-output-LayerNorm-weight-22-12-06-20-29-28---VmlldzozMDk0NTc2?accessToken=qgbz1njt7vwi0aspw7bh4n6ad4i0bctctgedq7qmsd4lds9l758ibbm5g4ndfs9f">L4-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-5-attention-output-LayerNorm-weight-22-12-06-20-29-11---VmlldzozMDk0NTc1?accessToken=wytlftyuujtgpmxxwk6q3kd30zjre3qiloa6w6h1aw6eb2169udu3lwtgbbyhd8x">L5-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-6-attention-output-LayerNorm-weight-22-12-06-20-28-53---VmlldzozMDk0NTcx?accessToken=k2ni3ba1xx50vpeto3xpnn1hc0ypvxd7im5mrk1pq47krl64fc765pjr1zt14eyc">L6-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-7-attention-output-LayerNorm-weight-22-12-06-20-28-37---VmlldzozMDk0NTY3?accessToken=kwsawb1wu60ecq1pnvtiim7sumqqca9uqn7rcv2o0ize794h49cgvchv680nju1z">L7-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-8-attention-output-LayerNorm-weight-22-12-06-20-28-20---VmlldzozMDk0NTY0?accessToken=uaqu2tqx39ki634m6j3fywcs76tzyp3q9i1riw9af52pehym9p3ukyamsyldq6te">L8-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-9-attention-output-LayerNorm-weight-22-12-06-20-27-57---VmlldzozMDk0NTYz?accessToken=8fvpijkrfvaywa909woxz38qa0dj8jsm6wyjd3tpubemoakzf3u84p3yel3g5b4a">L9-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-10-attention-output-LayerNorm-weight-22-12-06-20-26-45---VmlldzozMDk0NTU3?accessToken=qv18nauts75orcu6b3lpt227g3aa88wfnpy2lnckwn771pp5j4fn270lrnrxvaie">L10-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-roberta-encoder-layer-11-attention-output-LayerNorm-weight-22-12-06-20-25-42---VmlldzozMDk0NTUz?accessToken=6xc2nhzbbctc7o0h8ztaspy8m1ha9r1qoldav2e1rilvwl1yu48frels9134ittl">L11-Attention</a> | <a href="https://wandb.ai/anhdungitvn/test/reports/gradients-lm_head-layer_norm-weight-22-12-06-20-32-36---VmlldzozMDk0NTk4?accessToken=t2zroq4rqqu12sak0v8hohmkkqgjuqfgni6apqtearwxa01d3yppglhtux91y4ga">LM Head</a>
This model is Vietnamese XLM Robert, base, uncased.
Model description
Intended uses & limitations
You can use the raw model for either masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
How to use
Here is how to use this model to get the features of a given text in PyTorch:
from transformers import AutoTokenizer, XLMRobertaForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('anhdungitvn/vi-xlm-roberta-base')
model = XLMRobertaForMaskedLM.from_pretrained('anhdungitvn/vi-xlm-roberta-base')
text = "Câu bằng tiếng Việt."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
and in TensorFlow:
from transformers import AutoTokenizer, XLMRobertaForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('anhdungitvn/vi-xlm-roberta-base')
model = XLMRobertaForMaskedLM.from_pretrained('anhdungitvn/vi-xlm-roberta-base')
text = "Câu bằng tiếng Việt."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
Limitations and bias
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.
Training data
- Vietnamese Wiki (2022, 1GB).
- Vietnamese News (2022, 17.2GB).
Training procedure
Model was pretrained:
- Tokenizer: SentencePiece (BPE) with a vocab of 32000.
- Model type: xlmroberta
- Optimizer: AdamW
- Learning rate: 100µ
Evaluation results
Pretraining metrics and results:
When fine-tuned on downstream tasks, this model achieves the following results:
Task | SC | CN | NC | VSLP_2016_ASC | T | T | T | T |
---|---|---|---|---|---|---|---|---|
92.67 | x | x | x | x | x | x | x |
Downstream Task Dataset:
-
<a href="https://www.aivivn.com/contests/6">SC: Sentiment Classification</a>
- <a href="https://huggingface.co/datasets/anhdungitvn/sccr">a copy version</a>
-
<a href="https://huggingface.co/datasets/truongpdd/Covid-19-ner-lowercased">CN: Covid-19 NER</a>
-
<a href="https://huggingface.co/datasets/truongpdd/new_categorical_dataset">NC: News Classification</a>
-
<a href="https://huggingface.co/datasets/truongpdd/VSLP_2016_ASC">VSLP_2016_ASC</a>
Evaluation results
SC: Sentiment Classification (Phân loại sắc thái bình luận)
<img width="1024px" src="https://i.postimg.cc/02fLqVDP/sc-cosine-schedule-with-warmup.png">
global_step | train_loss | mcc | tp | tn | fp | fn | auroc | auprc | eval_loss | acc |
---|---|---|---|---|---|---|---|---|---|---|
9 | 0.3401549756526947 | 0.7379583642283261 | 576 | 827 | 115 | 91 | 0.9352091470188472 | 0.8836813825582118 | 0.307791425122155 | 0.8719701678060907 |
10 | 0.3009944260120392 | 0.7153561686108502 | 644 | 714 | 228 | 23 | 0.9341794071117306 | 0.8855559953803023 | 0.4522255692217085 | 0.8440024860161591 |
18 | 0.2707692086696625 | 0.7568798071058903 | 553 | 867 | 75 | 114 | 0.9546882609650588 | 0.9305472414572102 | 0.2917362087302738 | 0.8825357364822871 |
20 | 0.324040949344635 | 0.7954276426347437 | 621 | 823 | 119 | 46 | 0.9585541624092412 | 0.9350586373986622 | 0.2703523271613651 | 0.8974518334369174 |
27 | 0.2098093926906585 | 0.8167227165176854 | 619 | 844 | 98 | 48 | 0.9641707808516125 | 0.9433042047286796 | 0.2318740271859699 | 0.9092604101926662 |
30 | 0.1790707111358642 | 0.8255523946160684 | 608 | 864 | 78 | 59 | 0.9659978927733586 | 0.9451571077189844 | 0.2376453859938515 | 0.9148539465506526 |
36 | 0.1825853586196899 | 0.8376375135092212 | 617 | 864 | 78 | 50 | 0.9686303345142716 | 0.950031878654048 | 0.2207422753175099 | 0.9204474829086389 |
40 | 0.1000976711511612 | 0.8314993418206894 | 616 | 860 | 82 | 51 | 0.967654707678008 | 0.948305555504866 | 0.2241402003500196 | 0.9173399627097576 |
45 | 0.1039206981658935 | 0.8475960829767908 | 627 | 861 | 81 | 40 | 0.9687847159222936 | 0.9501524466016228 | 0.2368797047270668 | 0.9247980111870727 |
50 | 0.0745943784713745 | 0.8403425520503286 | 626 | 856 | 86 | 41 | 0.9692637757554344 | 0.9509443104477632 | 0.2345909766025013 | 0.9210689869484152 |
54 | 0.2125148624181747 | 0.8452359275481012 | 627 | 859 | 83 | 40 | 0.9685555311516216 | 0.9488296967445168 | 0.2353654123014874 | 0.9235550031075201 |
60 | 0.1117694526910781 | 0.850232290376995 | 621 | 870 | 72 | 46 | 0.9680144004430906 | 0.9471278564716886 | 0.2481025093131595 | 0.9266625233064015 |
63 | 0.0778111666440963 | 0.8399036864239123 | 616 | 867 | 75 | 51 | 0.9673109305220002 | 0.9452656759041156 | 0.2551250010728836 | 0.9216904909881914 |
70 | 0.0746646523475647 | 0.8413858446807108 | 618 | 866 | 76 | 49 | 0.9675878621198954 | 0.9460594635799092 | 0.2637980712784661 | 0.9223119950279677 |
72 | 0.0180709399282932 | 0.8423208614267836 | 616 | 869 | 73 | 51 | 0.9677056376270464 | 0.9463613907615576 | 0.2657667969663937 | 0.9229334990677439 |
80 | 0.043884091079235 | 0.8380734551435709 | 611 | 871 | 71 | 56 | 0.9675926368026178 | 0.9460924400848304 | 0.2679461878206994 | 0.9210689869484152 |
81 | 0.0459724143147468 | 0.8395225086880644 | 613 | 870 | 72 | 54 | 0.967546481536302 | 0.9459560641294614 | 0.2676228193773163 | 0.9216904909881914 |
90 | 0.0371172726154327 | 0.8397713078008345 | 615 | 868 | 74 | 52 | 0.967408015737354 | 0.94555620631723 | 0.2671518574158351 | 0.9216904909881914 |
90 | 0.0517170429229736 | 0.8397713078008345 | 615 | 868 | 74 | 52 | 0.967408015737354 | 0.94555620631723 | 0.2671518574158351 | 0.9216904909881914 |
Logs
<a href="https://wandb.ai/anhdungitvn/test_cls/reports/-tp-tn-tp-tn-fp-fn-22-10-21-09-50-27---VmlldzoyODI4MDcz?accessToken=axp3jb8pw01a7susb3n6pu0i5mfbay66v21oeae1dkx93km41rcuhvrkkqfqpar5">Perplexity</a>
BibTeX entry and citation info
@article{2022,
title={x},
author={x},
journal={ArXiv},
year={2022},
volume={x}
}
<a href="https://huggingface.co/exbert/?model=anhdungitvn/vi-xlm-roberta-base"> <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png"> </a>