t5-small-wikitext
t5-small trained on wikitext/wikitest-103-raw-v1 over 50k steps (around 2 hours of training) following T5 paper training procedure.
- batch_size: 32
- max_seq_length: 128
- optim: Adafactor
- sheduler: inverse square root (10k warm-up steps)