Monarch Mixer-BERT
The 110M checkpoint for M2-BERT-base from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.
Check out our GitHub for instructions on how to download and fine-tune it!
How to use
Using AutoModel:
from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-110M', trust_remote_code=True)
You can use this model with a pipeline for masked language modeling:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='alycialee/m2-bert-110M', trust_remote_code=True)
unmasker("Every morning, I enjoy a cup of [MASK] to start my day.")
Remote Code
This model requires trust_remote_code=True
to be passed to the from_pretrained
method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision
argument that specifies the exact git commit of the code, for example:
mlm = AutoModelForMaskedLM.from_pretrained(
'alycialee/m2-bert-110M',
trust_remote_code=True,
revision='eee02a4',
)
Configuration
Note use_flash_mm
is false by default. Using FlashMM is currently not supported.
Using hyena_training_additions
is turned off.