pytorch causal-lm

GPT-sl-base

This model is a Slovene GPT model, based on the bigscience workshop fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

Model architecture

GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

Training

The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

Step Validation Perplexity
50000 26.801
100000 25.574
150000 24.773
200000 24.099
250000 23.336
300000 22.607
350000 22.329
390000 22.293