summarization t5 seq2seq

T5 v1.1 Base finetuned for CNN news summarization in Dutch 🇳🇱

This model is t5-v1.1-base-dutch-cased finetuned on CNN Dailymail NL

For a demo of the Dutch CNN summarization models, head over to the Hugging Face Spaces for the Netherformer 📰 example application!

Rouge scores for this model are listed below.

Tokenizer

Dataset

All models listed below are trained on of the full configuration (39B tokens) of cleaned Dutch mC4, which is the original mC4, except

Models

TL;DR: yhavinga/t5-v1.1-base-dutch-cased is the best model.

model train seq len acc loss batch size epochs steps dropout optim lr duration
yhavinga/t5-base-dutch T5 512 0,70 1,38 128 1 528481 0.1 adafactor 5e-3 2d 9h
yhavinga/t5-v1.1-base-dutch-uncased t5-v1.1 1024 0,73 1,20 64 2 1014525 0.0 adafactor 5e-3 5d 5h
yhavinga/t5-v1.1-base-dutch-cased t5-v1.1 1024 0,78 0,96 64 2 1210000 0.0 adafactor 5e-3 6d 6h
yhavinga/t5-v1.1-large-dutch-cased t5-v1.1 512 0,76 1,07 64 1 1120000 0.1 adafactor 5e-3 86 13h

The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.

model input len target len Rouge1 Rouge2 RougeL RougeLsum Test Gen Len epochs batch size steps duration
yhavinga/t5-v1.1-base-dutch-cnn-test t5-v1.1 1024 96 34,8 13,6 25,2 32,1 79 6 64 26916 2h 40m
yhavinga/t5-v1.1-large-dutch-cnn-test t5-v1.1 1024 96 34,4 13,6 25,3 31,7 81 5 16 89720 11h

Acknowledgements

This project would not have been possible without compute generously provided by Google through the TPU Research Cloud. The HuggingFace 🤗 ecosystem was also instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM, and training the models:

Created by Yeb Havinga