XLM-RoBERTa (base) Middle High German Charter Masked Language Model
This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.
Model description
Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.
Intended uses & limitations
This model can be used for sequence prediction tasks, i.e., fill-masks.
Training and evaluation data
The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.
Training hyperparameters
The following hyperparameters were used during training:
- num_train_epochs: 15
- learning_rate: 2e-5
- weight-decay: 0,01
- train_batch_size: 16
- eval_batch_size: 16
- num_proc: 4
- block_size: 256
Training results
| Epoch | Training Loss | Validation Loss |
|---|---|---|
| 1 | 2.423800 | 2.025645 |
| 2 | 1.876500 | 1.700380 |
| 3 | 1.702100 | 1.565900 |
| 4 | 1.582400 | 1.461868 |
| 5 | 1.506000 | 1.393849 |
| 6 | 1.407300 | 1.359359 |
| 7 | 1.385400 | 1.317869 |
| 8 | 1.336700 | 1.285630 |
| 9 | 1.301300 | 1.246812 |
| 10 | 1.273500 | 1.219290 |
| 11 | 1.245600 | 1.198312 |
| 12 | 1.225800 | 1.198695 |
| 13 | 1.214100 | 1.194895 |
| 14 | 1.209500 | 1.177452 |
| 15 | 1.200300 | 1.177396 |
Perplexity: 3.25
Updates
- 2023-03-30: Upload
Citation
Please cite the following papers when using this model.
@misc{xlm-roberta-base-mhg-charter-mlm,
title={xlm-roberta-base-mhg-charter-mlm},
author={Atzenhofer-Baumgartner, Florian},
year = { 2023 },
url = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
publisher = { Hugging Face }
}