This model repository presents "TinyPubMedBERT", a distillated PubMedBERT (Gu et al., 2021) model.

The model is composed of 4-layers and distillated following methods introduced in the TinyBERT paper (Jiao et al., 2020).

TinyPubMedBERT is used as the initial weights for the training of the dmis-lab/KAZU-NER-module-distil-v1.0 for the KAZU (Korea University and AstraZeneca) framework.

Citation info

Joint-first authorship of Richard Jackson (AstraZeneca) and WonJin Yoon (Korea University). <br>Please cite the paper using the simplified citation format provided in the following section, or find the full citation information here

@inproceedings{YoonAndJackson2022BiomedicalNER,
  title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework",
  author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo",
  booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track",
  month = dec,
  year = "2022",    
  address = "Abu Dhabi, UAE",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.emnlp-industry.63",
  pages = "619--626",
}

This model used resources from PubMedBERT paper and TinyBERT paper.

Gu, Yu, et al. "Domain-specific language model pretraining for biomedical natural language processing." 
ACM Transactions on Computing for Healthcare (HEALTH) 3.1 (2021): 1-23.
Jiao, Xiaoqi, et al. "TinyBERT: Distilling BERT for Natural Language Understanding." 
Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.

Contact Information

For help or issues using the codes or model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.