Monarch Mixer-BERT

The 110M checkpoint for M2-BERT-base from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.

Check out our GitHub for instructions on how to download and fine-tune it!

How to use

Using AutoModel:

from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-110M', trust_remote_code=True)

You can use this model with a pipeline for masked language modeling:

from transformers import pipeline
unmasker = pipeline('fill-mask', model='alycialee/m2-bert-110M', trust_remote_code=True)
unmasker("Every morning, I enjoy a cup of [MASK] to start my day.")

Remote Code

This model requires trust_remote_code=True to be passed to the from_pretrained method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision argument that specifies the exact git commit of the code, for example:

mlm = AutoModelForMaskedLM.from_pretrained(
   'alycialee/m2-bert-110M',
   trust_remote_code=True,
   revision='eee02a4',
)

Configuration

Note use_flash_mm is false by default. Using FlashMM is currently not supported. Using hyena_training_additions is turned off.