NER Darija

darija-ner

<!-- Provide a quick summary of what the model is/does. -->

This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4.

Model Description

<!-- Provide a longer summary of what this model is. -->

Model Sources

<!-- Provide the basic links for the model. -->

Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. --> F1 score.

Results

DarNERcorp_test: F1 = 66.06%

MixedNERcorp_test: F1 = 70.06%

Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> If you use DarNERcorp dataset to train your models, cite the following paper:

Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief, Volume 48, 2023, 109234, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.109234. (https://www.sciencedirect.com/science/article/pii/S2352340923003530)

GitHub Repo:

Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner