generated_from_trainer

Table of Contents

Model description

robbert-base-v2-NER-NL-legislation-refs is a fine-tuned RobBERT model that was trained to recognize the entity type 'legislation references' (REF) in Dutch case law.

Specifically, this model is a pdelobelle/robbert-v2-dutch-base model (RoBERTa architecture) that was fine-tuned on the robbert-base-v2-NER-NL-legislation-refs-data dataset.

Training procedure

Dataset

This model was fine-tuned on the robbert-base-v2-NER-NL-legislation-refs-data dataset. This dataset consists of 512 token long examples which each contain one or more legislation references. These examples were created from a weakly labelled corpus of Dutch case law which was scraped from Linked Data Overheid, pre-tokenized and labelled (biluo_tags_from_offsets) through spaCy and further tokenized through applying Hugging Face's AutoTokenizer.from_pretrained() for pdelobelle/robbert-v2-dutch-base's tokenizer.

Results

Model Precision Recall F1-score
RobBERT 0.874 0.903 0.888

Using Hugging Face's hosted inference API widget this model can be quickly tested on the provided examples. Note that the hosted inference API widget incorrectly presents the last token of a legislation reference as a seperate entity due to the workings of its 'simple' aggregation_strategy. While this model was fine-tuned on training data labelled in accordence with the BILOU scheme, the hosted inference API groups entities by merging B- and I- tags when the tag is similar (thereby omitting the L- tags).

Limitations and biases

More information needed

BibTeX entry and citation info

More information needed