Model: covid-19-vaccination-tweet-relevance

Overview

This model is a text classifier trained to determine whether a tweet is related to COVID-19 vaccination or not.

Usage

tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")

Training corpus

The training corpus comprises 9373 tweets, a daily random sampled dated from December 2020 to June 2022. These tweets were labeled by domain experts.

We have seperated trained another model for classifying the stance of a tweet towards the COVID-19 vaccination. Please refer to covid-19-vaccination-tweet-stance for more information.

Output Label Index

Performance Metrics

The model's performance metrics on the test set are as follows:

These metrics are based on a test set with a total size of 3699 samples.

Confusion Matrix

The confusion matrix of predictions on the test set is as follows:

Predicted: irrelevance Predicted: relevance
True: irrelevance 1239 165
True: relevance 62 2233

Model Architecture

The model is fine-tuned based on COVID-Twitter-BERT v2.

Contact

Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)