Model Description

TinyClinicalBERT is a distilled version of the BioClinicalBERT which is distilled for 3 epochs using a total batch size of 192 on the MIMIC-III notes dataset.

Distillation Procedure

This model uses a unique distillation method called ‘transformer-layer distillation’ which is applied on each layer of the student to align the attention maps and the hidden states of the student with those of the teacher.

Architecture and Initialisation

This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the model's small hidden dimension size, it uses random initialisation.

Citation

If you use this model, please consider citing the following paper:

@misc{https://doi.org/10.48550/arxiv.2302.04725,
  doi = {10.48550/ARXIV.2302.04725},
  url = {https://arxiv.org/abs/2302.04725},
  author = {Rohanian, Omid and Nouriborji, Mohammadmahdi and Jauncey, Hannah and Kouchaki, Samaneh and Group, ISARIC Clinical Characterisation and Clifton, Lei and Merson, Laura and Clifton, David A.},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7, 68T50},
  title = {Lightweight Transformers for Clinical Natural Language Processing},
  publisher = {arXiv},
  year = {2023},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Model Description

Distillation Procedure

Architecture and Initialisation

Citation

NSDT 3DConvert

UnrealSynth

DreamTexture.js