Clinical notes Discharge summaries RoBERTa

Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. (https://arxiv.org/abs/2204.06683)

Max sequence 128
Batch size 128
Learning rate 5e-5
Training epochs 15
Training time 40 GPU-hours