The Office Speaker Classification Model

This model is designed to classify speakers (characters) of the TV show The Office based on a given line. It is built on top of the popular DistilBertForSequenceClassification model.

How to Use

from transformers import DistilBertForSequenceClassification, DistilBertTokenizer

model_name = 'mo374z/theoffice_speaker_classification'
model = DistilBertForSequenceClassification.from_pretrained(model_name)
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

Once the model and tokenizer are loaded, you can classify the speaker of a given line as follows:

test_sample_str = "Hi i'm Michael Scott"
test_sample = tokenizer.encode(test_sample_str, truncation=True, padding=True, return_tensors='pt')
test_sample = test_sample.to(device)
output = model(test_sample)

To get an explanation for the model prediction, you can use the SHAP explainer. By using SHAP, you can gain insights into the speaking style of the characters, and understand which words and phrases are most important for the model's prediction.

SHAP

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It can be used to explain the contribution of each input feature to the model's prediction. By using SHAP, you can gain insights into the speaking style of the characters, and understand which words and phrases are most important for the model's prediction.

License

This model is licensed under the MIT License.