
miniALBERT is a recursive transformer model which uses cross-layer parameter sharing, embedding factorisation, and bottleneck adapters to achieve high parameter efficiency. Since miniALBERT is a compact model, it is trained using a layer-to-layer distillation technique, using the BioClinicalBERT model as the teacher. This model is trained for 3 epochs on the MIMIC-III notes dataset. In terms of architecture, this model uses an embedding dimension of 312, a hidden size of 768, an MLP expansion rate of 4, and a reduction factor of 16 for bottleneck adapters. In general, this model uses 6 recursions and has a unique parameter count of 18 million parameters.


Since miniALBERT uses a unique architecture it can not be loaded using ts.AutoModel for now. To load the model, first, clone the miniALBERT GitHub project, using the below code:

git clone

Then use the sys.path.append to add the miniALBERT files to your project and then import the miniALBERT modeling file using the below code:

import sys

from minialbert_modeling import MiniAlbertForSequenceClassification, MiniAlbertForTokenClassification

Finally, load the model like a regular model in the transformers library using the below code:

# For NER use the below code
model = MiniAlbertForTokenClassification.from_pretrained("nlpie/clinical-miniALBERT-312")
# For Sequence Classification use the below code
model = MiniAlbertForTokenClassification.from_pretrained("nlpie/clinical-miniALBERT-312")

In addition, For efficient fine-tuning using the pre-trained bottleneck adapters use the below code:



If you use the model, please cite our paper:

  doi = {10.48550/ARXIV.2302.04725},
  url = {},
  author = {Rohanian, Omid and Nouriborji, Mohammadmahdi and Jauncey, Hannah and Kouchaki, Samaneh and Group, ISARIC Clinical Characterisation and Clifton, Lei and Merson, Laura and Clifton, David A.},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7, 68T50},
  title = {Lightweight Transformers for Clinical Natural Language Processing},
  publisher = {arXiv},
  year = {2023},
  copyright = { perpetual, non-exclusive license}