Soraberta: Unsupervised Language Model Pre-training for Wolof
Soraberta is pretrained roberta-base model on wolof language . Roberta was introduced in this paper
Soraberta models
Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
---|---|---|---|---|
soraberta-base |
6 | 12 | 514 | 83 M |
Using Soraberta with Hugging Face's Transformers
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta')
>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")
[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
'score': 0.9783930778503418,
'token': 4621,
'token_str': ' gileem'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
'score': 0.009271537885069847,
'token': 2155,
'token_str': ' jend'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
'score': 0.0027585660573095083,
'token': 704,
'token_str': ' aw'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
'score': 0.001120452769100666,
'token': 1171,
'token_str': ' pel'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
'score': 0.0005133090307936072,
'token': 5820,
'token_str': ' juum'}]
Training data
The data sources are Bible OT , WOLOF-ONLINE
Contact
Please contact abdouaziz@gmail.com for any question, feedback or request.