generated_from_trainer

Table of Contents

Model description

mbert-base-cased-NER-NL-legislation-refs is a fine-tuned BERT model that was trained to recognize the entity type 'legislation references' (REF) in Dutch case law.

Specifically, this model is a bert-base-multilingual-cased model that was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset.

Training procedure

Dataset

This model was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset. This dataset consists of 512 token long examples which each contain one or more legislation references. These examples were created from a weakly labelled corpus of Dutch case law which was scraped from Linked Data Overheid, pre-tokenized and labelled (biluo_tags_from_offsets) through spaCy and further tokenized through applying Hugging Face's AutoTokenizer.from_pretrained() for bert-base-multilingual-cased's tokenizer.

Results

Model Precision Recall F1-score
mBERT 0.891 0.919 0.905

Using Hugging Face's hosted inference API widget this model can be quickly tested on the provided examples. Note that the hosted inference API widget incorrectly presents the last token of a legislation reference as a seperate entity due to the workings of its 'simple' aggregation_strategy. While this model was fine-tuned on training data labelled in accordence with the BILOU scheme, the hosted inference API groups entities by merging B- and I- tags when the tag is similar (thereby omitting the L- tags).

Limitations and biases

More information needed

BibTeX entry and citation info

More information needed