NER-fine-tuned-BETO: model fine-tuned from BETO for NER task.

Language: es Datasets:


[NER-fine-tuned-BETO] is a NER model that was fine-tuned from BETO on the 2002 Conll and the WikiNEuRal spanish datasets. Model was trained on the Conll 2002 train dataset (~8320 sentences) and a bootstrapped dataset of WikiNEuRal, where we re-evaluate the dataset and only keep the sentences where all the labels matched the predictions made. Model was evaluated on the test dataset of Conll2002.

Training data

Training data was classified as follow:

Abbreviation Description
O Outside of NE
PER Person’s name
ORG Organization
LOC Location
MISC Miscellaneous

Alongside the IOB formatting, this is:

How to use NER-fine-tuned-BETO with HuggingFace

Load the model and its tokenizer :

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("NazaGara/NER-fine-tuned-BETO", use_auth_token=True)
model = AutoModelForTokenClassification.from_pretrained("NazaGara/NER-fine-tuned-BETO", use_auth_token=True)

nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
nlp('Ignacio se fue de viaje por Buenos aires')

[{'entity_group': 'PER',
  'score': 0.9997764,
  'word': 'Ignacio',
  'start': 0,
  'end': 7},
 {'entity_group': 'LOC',
  'score': 0.9997932,
  'word': 'Buenos aires',
  'start': 28,
  'end': 40}]

Model Performance


precision recall f1-score
0.9833 0.8950 0.8998

By classes

class precision recall f1-score
O 0.9958 0.9965 0.990
B-PER 0.9572 0.9741 0.9654
I-PER 0.9487 0.9921 0.9699
B-ORG 0.8823 0.9264 0.9038
I-ORG 0.9253 0.9264 0.9117
B-LOC 0.8967 0.8736 0.8850
I-LOC 0.8870 0.8215 0.8530
B-MISC 0.7541 0.7964 0.7747
I-MISC 0.9026 0.7827 0.8384