multilingual bert roberta xlmr bm

Model type: Transformer-based masked language model

Training data: No additional pretraining, merges two existing models

Languages: 100+ languages

Architecture:

Custom merging technique to combine weights from both base models into one unified model