Danish Legal LM
This model is pre-training on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings.
It achieves the following results on the evaluation set:
- Loss: 0.7302 (up to 128 tokens)
 - Loss: 0.7847 (up to 512 tokens)
 
Model description
This is a RoBERTa (Liu et al., 2019) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.
Intended uses & limitations
More information needed
Training and evaluation data
This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus.
Training procedure
The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - distributed_type: tpu
 - num_devices: 8
 - gradient_accumulation_steps: 2
 - total_train_batch_size: 256
 - total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.05
 - training_steps: 500000 + 100000
 
Training results
| Training Loss | Length | Step | Validation Loss | 
|---|---|---|---|
| 1.4648 | 128 | 50000 | 1.2920 | 
| 1.2165 | 128 | 100000 | 1.0625 | 
| 1.0952 | 128 | 150000 | 0.9611 | 
| 1.0233 | 128 | 200000 | 0.8931 | 
| 0.963 | 128 | 250000 | 0.8477 | 
| 0.9122 | 128 | 300000 | 0.8168 | 
| 0.8697 | 128 | 350000 | 0.7836 | 
| 0.8397 | 128 | 400000 | 0.7560 | 
| 0.8231 | 128 | 450000 | 0.7476 | 
| 0.8207 | 128 | 500000 | 0.7243 | 
| Training Loss | Length | Step | Validation Loss | 
|---|---|---|---|
| 0.7045 | 512 | +50000 | 0.8318 | 
| 0.6432 | 512 | +100000 | 0.7913 | 
Framework versions
- Transformers 4.18.0
 - Pytorch 1.12.0+cu102
 - Datasets 2.0.0
 - Tokenizers 0.12.0