mega for MLM - 65M A `small' checkpoint, about 65M params. 2048 context length, uses the tokenizer from roberta-base chunking is disabled on this checkpoint 512 hidden size, 12 encoder layers