Model type: Transformer-based masked language model Training data: No additional pretraining, merges two existing models Languages: 100+ languages Architecture: Base architectures: XLM-RoBERTa base (multilingual) BERT base cased (multilingual) Custom merging technique to combine weights from both base models into one unified model