gpt-neo-1.3B gpt-neo

GPT Neo 1.3B pre-trained on cleaned Dutch mC4 🇳🇱

A GPT-Neo model trained from scratch on Dutch, with perplexity 16.0 on cleaned Dutch mC4.

How To Use

You can use this GPT-Neo model directly with a pipeline for text generation.

MODEL_DIR='yhavinga/gpt-neo-1.3B-dutch'
from transformers import pipeline, GPT2Tokenizer, GPTNeoForCausalLM
tokenizer = GPT2Tokenizer.from_pretrained(MODEL_DIR)
model = GPTNeoForCausalLM.from_pretrained(MODEL_DIR)
generator = pipeline('text-generation', model, tokenizer=tokenizer)

generated_text = generator('1 - geel. 2 - groen. 3 -', max_length=60, num_beams=4, no_repeat_ngram_size=3, repetition_penalty=2.0)

"1 - geel. 2 - groen. 3 - rood. 4 - blauw. 5 - bruin. 6 - zwart. 7 - oranje. 8 - roze. 9 - paars. 10 - wit. 11 - grijs. 12 - magenta. 13 - lila. 14 - lichtgroen. 15"

Tokenizer

Dataset

This model was trained on the full configuration (33B tokens) of cleaned Dutch mC4, which is the original mC4, except

Models

TL;DR: yhavinga/gpt2-medium-dutch is the best model.

model params train seq len ppl loss batch size epochs steps optim lr duration config
yhavinga/gpt-neo-125M-dutch gpt neo 125M 512 20.9 3.04 128 1 190000/558608 adam 2.4e-3 1d 12h full
yhavinga/gpt2-medium-dutch gpt2 345M 512 15.1 2.71 128 1 320000/520502 adam 8e-4 7d 2h full
yhavinga/gpt2-large-dutch gpt2 762M 512 15.1 2.72 32 1 1100000/2082009 adafactor 3.3e-5 8d 15h large
yhavinga/gpt-neo-1.3B-dutch gpt neo 1.3B 512 16.0 2.77 16 1 960000/3049896 adafactor 5e-4 7d 11h full

Acknowledgements

This project would not have been possible without compute generously provided by Google through the TPU Research Cloud. The HuggingFace 🤗 ecosystem was also instrumental in most, if not all, parts of the training. The following repositories where helpful in setting up the TPU-VM, and training the models:

Created by Yeb Havinga