Multilabel classification model trained on the Toxic Comments dataset from Kaggle. (https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data)

Fine tuned using DistilBert.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("pretrained_model")
tokenizer = AutoTokenizer.from_pretrained("model_tokenizer")

X_train = ["Why is Owen's retirement from football not mentioned? He hasn't played a game since 2005."]
batch = tokenizer(X_train, truncation=True, padding='max_length', return_tensors="pt")
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]

with torch.no_grad():
  outputs = model(**batch)
  predictions = torch.sigmoid(outputs.logits)*100
  probs = predictions[0].tolist()
  for i in range(len(probs)):
    print(f"{labels[i]}: {round(probs[i], 3)}%")

Output expected below:

toxic: 0.676%
severe_toxic: 0.001%
obscene: 0.098%
threat: 0.007%
insult: 0.021%
identity_hate: 0.004%