Introduction

This model is a fine-tuned version of xlm-roberta-large for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).

Usage

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_xlm-large")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_xlm-large")

example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

for ent in ner_pipe(example):
    print(ent)

Dataset

ToDo

Model performance

entity	precision	recall	f1
LOC	0.973	0.983	0.978
MISC	0.760	0.788	0.773
ORG	0.885	0.701	0.783
PER	0.937	0.878	0.906
micro avg	0.953	0.958	0.955
macro avg	0.889	0.838	0.860
weighted avg	0.953	0.958	0.955

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Framework versions

Transformers 4.28.1
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3