This is a masked language model that was trained on IMDB dataset using a finetuned DistilBERT model.
Note:
I published a tutorial explaining how transformers work and how to train a masked language model using transformer. https://olafenwaayoola.medium.com/the-concept-of-transformers-and-training-a-transformers-model-45a09ae7fb50
Rest API Code for Testing the Masked LanguageĀ Model
Inference API python code for testing the masked language model.
import requests
API_URL = "https://api-inference.huggingface.co/models/ayoolaolafenwa/Masked-Language-Model"
headers = {"Authorization": "Bearer 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"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Washington DC is the [MASK] of USA.",
})
print(output[0]["sequence"])
Output
washington dc is the capital of usa.
It produces the correct output, washington dc is the capital of usa.
Load the Masked Language Model with Transformers
You can easily load the Language model with transformers using this code.
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch
tokenizer = AutoTokenizer.from_pretrained("ayoolaolafenwa/Masked-Language-Model")
model = AutoModelForMaskedLM.from_pretrained("ayoolaolafenwa/Masked-Language-Model")
inputs = tokenizer("The internet [MASK] amazing.", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
output = tokenizer.decode(predicted_token_id)
print(output)
Output
is
It prints out the predicted masked word is.