gbert-large-autopart

This model is a fine-tuned version of deepset/gbert-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3832

Model description

this model contains a domain adaptation of the model deepset/gbert-large using a dataset containing 54.000 sample sentences of german top 30 DAX company websites.

Intended uses & limitations

use in classification problems where the samples are from german company websites.

Training and evaluation data

80 percent of the available samples where used for training. Evaluation was performed on 20 percent of the data.

Training procedure

masked language modelling using AutoModelForMaskedLM

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 96
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 16

Training results

Training Loss	Epoch	Step	Validation Loss
0.5949	1.0	445	0.5025
0.5211	2.0	890	0.4729
0.5036	3.0	1335	0.4893
0.4916	4.0	1780	0.4647
0.4464	5.0	2225	0.4401
0.425	6.0	2670	0.4246
0.4076	7.0	3115	0.4169
0.3962	8.0	3560	0.4140
0.3829	9.0	4005	0.4220
0.3702	10.0	4450	0.4119
0.3566	11.0	4895	0.3993
0.3442	12.0	5340	0.3924
0.3365	13.0	5785	0.3880
0.3316	14.0	6230	0.3900
0.3213	15.0	6675	0.3800
0.316	16.0	7120	0.3832

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3