generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

first

This model is a fine-tuned version of longformer-gottbert-base-8192-aw512- on the a 500 million token subset of the german parts of the OSCAR dataset. It achieves the following results on the custom evaluation set:

Model description

The weights of the model are initialized from the german version of Roberta gottbert-base. The local attention windows have a fixed size of 512 tokens across all layers. The maximum sequence length is 8192.

Intended uses & limitations

Longformer models enable processing long texts using a mixture of local attention on each subword token and task specific global attention on a subset of the tokens.

Training and evaluation data

The OSCAR dataset is freely avaible corpus of filtered web texts from the Common Crawl in various languages. We used the 2017 version of the dataset.

Training procedure

The model was trained with masked language modeling for 3 epochs on a customly created 500 million tokens subset of the german proportion of the OSCAR dataset. It was validated using 5% of the original subset.

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss
2.5636 0.1 500 2.2399
2.0426 0.2 1000 1.8841
1.9653 0.3 1500 1.7807
1.9422 0.4 2000 1.7206
1.9323 0.49 2500 1.6800
1.7587 0.59 3000 1.6507
1.7239 0.69 3500 1.6316
1.7452 0.79 4000 1.6137
1.7415 0.89 4500 1.5983
1.7733 0.99 5000 1.5830
1.7656 1.09 5500 1.5735
1.6543 1.19 6000 1.5643
1.7131 1.28 6500 1.5546
1.6456 1.38 7000 1.5503
1.716 1.48 7500 1.5422
1.806 1.58 8000 1.5377
1.8407 1.68 8500 1.5327
1.6371 1.78 9000 1.5278
1.6453 1.88 9500 1.5231
1.7754 1.98 10000 1.5214
1.7695 2.08 10500 1.5165
1.7109 2.17 11000 1.5138
1.6992 2.27 11500 1.5107
1.6707 2.37 12000 1.5097
1.6835 2.47 12500 1.5040
1.7171 2.57 13000 1.5041
1.7257 2.67 13500 1.4990
1.6287 2.77 14000 1.5017
1.7737 2.87 14500 1.4983
1.4002 2.96 15000 1.4992

Framework versions