Introduction
This model is a fine-tuned version of xlm-roberta-large for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).
Usage
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_xlm-large")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_xlm-large")
example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
for ent in ner_pipe(example):
    print(ent)
Dataset
ToDo
Model performance
| entity | precision | recall | f1 | 
|---|---|---|---|
| LOC | 0.973 | 0.983 | 0.978 | 
| MISC | 0.760 | 0.788 | 0.773 | 
| ORG | 0.885 | 0.701 | 0.783 | 
| PER | 0.937 | 0.878 | 0.906 | 
| micro avg | 0.953 | 0.958 | 0.955 | 
| macro avg | 0.889 | 0.838 | 0.860 | 
| weighted avg | 0.953 | 0.958 | 0.955 | 
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
 - train_batch_size: 32
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 3.0
 
Framework versions
- Transformers 4.28.1
 - Pytorch 2.0.1+cu117
 - Datasets 2.12.0
 - Tokenizers 0.13.3