Tamil-Tokenizer Tamil-language-model

tokenizer - BPE 30_522 vocab size

model - Roberta

    trained using MLM 
    OSCAR dataset
    train data size 5000 lines olly