NER

Model description

mbert-base-uncased-ner-pcm is a model based on the fine-tuned Multilingual BERT base uncased model, previously fine-tuned for Named Entity Recognition using 10 high-resourced languages. It has been trained to recognize four types of entities:

Intended Use

Training Data

This model was fine-tuned on the Nigerian Pidgin corpus (pcm) of the MasakhaNER dataset. However, we thresholded the number of entity groups per sentence in this dataset to 10 entity groups.

Training procedure

This model was trained on a single NVIDIA P5000 from Paperspace

Hyperparameters

Evaluation Data

We evaluated this model on the test split of the Swahili corpus (pcm) present in the MasakhaNER with no thresholding.

Metrics

Limitations

Caveats and Recommendations

Results

Model Name Precision Recall F1-score
mbert-base-uncased-ner-pcm 90.38 82.44 86.23

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("arnolfokam/mbert-base-uncased-ner-pcm")
model = AutoModelForTokenClassification.from_pretrained("arnolfokam/mbert-base-uncased-ner-pcm")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Mixed Martial Arts joinbodi, Ultimate Fighting Championship, UFC don decide say dem go enta back di octagon on Saturday, 9 May, for Jacksonville, Florida."

ner_results = nlp(example)
print(ner_results)