This model is based on GPT2-Medium finetuned on chat logs from


The data consists of ~3.8GB of plaintext across 632 days of logs, ranging from 2021-01-01 to 2022-09-26. They were sourced from The logs were cleaned by dropping

The data was batched into groups of up to 512 tokens, preferring to end on a newline (\n) rather than start another line and truncate it. The batches were then padded to 512 tokens using a pad token added to the model and tokenizer.

10% of the data was set aside for validation.


Training was done on a system with a 6800XT (16GB of VRAM) and 32GB of RAM. The following hyperparameters were used:


Evaluation was performed 10 times throughout training. Accuracy and perplexity were calculated. <details> <summary>View Metrics</summary>

Accuracy Loss

Training Metrics

Epochs Validation Loss Accuracy
0.1 1.778 0.6789
0.2 1.721 0.6858
0.3 1.687 0.6899
0.4 1.664 0.6925
0.5 1.645 0.695
0.6 1.63 0.6969
0.7 1.616 0.6987
0.8 1.604 0.7003
0.9 1.594 0.7017
1.0 1.588 0.7025