Spanish GPT-2 trained on large_spanish_corpus
This is a Spanish GPT-2 model trained from scratch on the large_spanish_corpus aka BETO's corpus with Flax This is part of the Flax/Jax Community Week, organised by HuggingFace and TPU usage sponsored by Google.
Dataset
The dataset is about 20 GB. 95% of the data was used for training and the rest 5% for validation.
Metrics (on evaluation dataset)
- Loss: 2.413
- Perplexity: 11.36
Team members
- Manuel Romero (mrm8488)
- María Grandury (mariagrandury)
- Pablo González de Prado (Pablogps)
- Daniel Vera (daveni)
- Sri Lakshmi (srisweet)
- José Posada (jdposa)
- Santiago Hincapie (shpotes)
- Jorge (jorgealro)