Model Description

This model is a finetuned version of the DistilBERT base multilingual model modified for token classification where the tokens are ASCII characters and the labels are Vietnamese characters. The code for building and training this model can be found here.

The model is trained on the Vietnamese wikipedia data here.

We encourage potential users of this model to check out the BERT base multilingual model card to learn more about usage, limitations and potential biases.

Developed by: Daniel Saelid, Sachin Kumar, Yulia Tsvetkov
Model type: Transformer-based language model
Related Models: DistilBERT base multilingual model, BERT base multilingual model
Resources for more information:
- Associated Paper

Direct Use

You can use the raw model to restore diacritics for ASCII-ified Vietnamese text.

Evaluation

The model developers report the following accuracies for restoring diacritics on ASCII-ified Vietnamese text. All metrics only consider syllables that contain just alphabetic characters.

| Character Accuracy | Syllable Accuracy | Sentence Accuracy | | 98.75 | 96.10 | 50.26 |

Model Description

Direct Use

Evaluation

NSDT 3DConvert

UnrealSynth

DreamTexture.js