Danish Legal LM

This model is pre-training on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set:

Loss: 0.7302 (up to 128 tokens)
Loss: 0.7847 (up to 512 tokens)

Model description

This is a RoBERTa (Liu et al., 2019) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.

Intended uses & limitations

More information needed

Training and evaluation data

This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus.

Training procedure

The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: tpu
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 256
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
training_steps: 500000 + 100000

Training results

Training Loss	Length	Step	Validation Loss
1.4648	128	50000	1.2920
1.2165	128	100000	1.0625
1.0952	128	150000	0.9611
1.0233	128	200000	0.8931
0.963	128	250000	0.8477
0.9122	128	300000	0.8168
0.8697	128	350000	0.7836
0.8397	128	400000	0.7560
0.8231	128	450000	0.7476
0.8207	128	500000	0.7243

Training Loss	Length	Step	Validation Loss
0.7045	512	+50000	0.8318
0.6432	512	+100000	0.7913

Framework versions

Transformers 4.18.0
Pytorch 1.12.0+cu102
Datasets 2.0.0
Tokenizers 0.12.0