chemical-domain safety-datasheets

BERT for Chemical Industry

A BERT-based language model further pre-trained from the checkpoint of SciBERT. We used a corpus of over 40,000+ technical documents from the Chemical Industrial domain and combined it with 13,000 Wikipedia Chemistry articles, ranging from Safety Data Sheets and Products Information Documents, with 250,000+ tokens from the Chemical domain and pre-trained using MLM and over 9.2 million paragraphs.

from transformers import pipeline
fill_mask = pipeline(
    "fill-mask",
    model="recobo/chemical-bert-uncased",
    tokenizer="recobo/chemical-bert-uncased"
)
fill_mask("we create [MASK]")