seq2seq lm-head

t5-base-dutch

Created by Yeb Havinga & Dat Nguyen during the Hugging Face community week, organized by HuggingFace and TPU usage sponsored by Google, for the project Pre-train T5 from scratch in Dutch.

See also the fine-tuned t5-base-dutch-demo model, and the demo application Netherformer 📰, that are based on this model.

5 jan 2022: Model updated. Evaluation accuracy increased from 0.64 to 0.70.

11 jan 2022: See also yhavinga/t5-v1.1-base-dutch-cased with eval acc 0.78

Model

Dataset

This model was trained on the full configuration of cleaned Dutch mC4, which is the original mC4, except

Tokenization

A SentencePiece tokenizer was trained from scratch on this dataset. The total tokens of the full configuration is 34B

Training

The model was trained on the full mc4_nl_cleaned dataset configuration for 1 epoch, consisting of 34B tokens, for 528 482 steps with a batch size of 128 and took 57 hours. A triangle learning rate schedule was used, with peak learning rate 0.005.

Evaluation