gunghio/xlm-roberta-base-finetuned-panx-ner
This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.
xtreme
datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.
Only 75% of the whole dataset was used.
Intended uses & limitations
Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.
Training and evaluation data
Training dataset: xtreme
Training results
It achieves the following results on the evaluation set:
- Precision: 0.8744154472771157
- Recall: 0.8791424269015351
- F1: 0.8767725659462058
- Accuracy: 0.9432040948504613
Details:
Label | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
PER | 0.922 | 0.908 | 0.915 | 26639 |
LOC | 0.880 | 0.906 | 0.892 | 37623 |
ORG | 0.821 | 0.816 | 0.818 | 28045 |
Overall | 0.874 | 0.879 | 0.877 | 92307 |
Usage
Set aggregation stragey according to documentation.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"
ner_results = nlp(example)
print(ner_results)