Introduction
This model is a fine-tuned version of roberta-base-bne for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).
Usage
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_bne-base")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_bne-base")
example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
for ent in ner_pipe(example):
print(ent)
Dataset
ToDo
Model performance
entity | precision | recall | f1 |
---|---|---|---|
LOC | 0.986 | 0.982 | 0.984 |
MISC | 0.800 | 0.911 | 0.852 |
ORG | 0.896 | 0.779 | 0.833 |
PER | 0.953 | 0.937 | 0.945 |
micro avg | 0.967 | 0.971 | 0.969 |
macro avg | 0.909 | 0.902 | 0.903 |
weighted avg | 0.968 | 0.971 | 0.969 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Framework versions
- Transformers 4.28.1
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3