agriculture-domain agriculture fill-mask

BERT for Agriculture Domain

A BERT-based language model further pre-trained from the checkpoint of SciBERT. The dataset gathered is a balance between scientific and general works in agriculture domain and encompassing knowledge from different areas of agriculture research and practical knowledge.

The corpus contains 1.2 million paragraphs from National Agricultural Library (NAL) from the US Gov. and 5.3 million paragraphs from books and common literature from the Agriculture Domain.

The self-supervised learning approach of MLM was used to train the model.

from transformers import pipeline
fill_mask = pipeline(
    "fill-mask",
    model="recobo/agriculture-bert-uncased",
    tokenizer="recobo/agriculture-bert-uncased"
)
fill_mask("[MASK] is the practice of cultivating plants and livestock.")