ClinicBERT has the same architecture of RoBERTa model. It has been trained on clinical text and can be used for feature extraction from textual data.
How to use
Feature Extraction
from transformers import RobertaModel, RobertaTokenizer
model = RobertaModel.from_pretrained("tdobrxl/ClinicBERT")
tokenizer = RobertaTokenizer.from_pretrained("tdobrxl/ClinicBERT")
text = "Randomized Study of Shark Cartilage in Patients With Breast Cancer."
last_hidden_state, pooler_output = model(tokenizer.encode(text, return_tensors="pt")).last_hidden_state, model(tokenizer.encode(text, return_tensors="pt")).pooler_output
Masked Word Prediction
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="tdobrxl/ClinicBERT", tokenizer="tdobrxl/ClinicBERT")
text = "this is the start of a beautiful <mask>."
fill_mask(text)
[{'score': 0.26558592915534973, 'token': 363, 'token_str': ' study', 'sequence': 'this is the start of a beautiful study.'}, {'score': 0.06330082565546036, 'token': 2010, 'token_str': ' procedure', 'sequence': 'this is the start of a beautiful procedure.'}, {'score': 0.04393036663532257, 'token': 661, 'token_str': ' trial', 'sequence': 'this is the start of a beautiful trial.'}, {'score': 0.0363750196993351, 'token': 839, 'token_str': ' period', 'sequence': 'this is the start of a beautiful period.'}, {'score': 0.027248281985521317, 'token': 436, 'token_str': ' treatment', 'sequence': 'this is the start of a beautiful treatment.'}