text-classification language-identification

OpenLID

Model description

fastText is a library for efficient learning of word representations and sentence classification. fastText is designed to be simple to use for developers, domain experts, and students. It's dedicated to text classification and learning word representations, and was designed to allow for quick model iteration and refinement without specialized hardware. fastText models can be trained on more than a billion words on any multicore CPU in less than a few minutes.

Intended uses & limitations

You can use pre-trained word vectors for text classification or language identification. See the tutorials and resources on its official website to look for tasks that interest you.

How to use

Here is how to use this model to detect the language of a given text:

>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="davanstrien/OpenLID", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")

(('__label__eng_Latn',), array([0.81148803]))

>>> model.predict("Hello, world!", k=5)

(('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'), 
 array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))

Limitations and bias

[More Information needed]

Training data

[More Information needed]

Training procedure

License

[More Information needed]

Evaluation datasets

[More Information needed]

BibTeX entry and citation info

[More Information needed]