finetuned-twitter-roberta-base-sep2022-tweetcognition

This model is a fine-tuned version of cardiffnlp/twitter-roberta-base-sep2022 on custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users. It achieves the following results on the evaluation set:

Loss: 0.2433
Accuracy: 0.9545

Model description

A RoBERTa-base model trained on 168.86M tweets until the end of September 2022 (15M tweets increment) finetuned and trained on custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users with the scope of performing a specific text xlassification task: classify posts from the Twitter social media platform into a set of 30 distinct classes, each representing a major life event that the author of the post recently experienced. RoBERTa (Robustly Optimized BERT approach) is a state-of-the-art natural language processing (NLP) model developed by Facebook AI.

Intended uses & limitations

The scope of this fine-tuned language model is to be used for a specific text classification task: classify posts from the Twitter social media platform into a set of 30 distinct classes, each representing a major life event that the author of the post recently experienced.

The model can be further improved by training on an even larger training dataset with an extended and more diverse set of life events classes.

Training procedure

A fine-tuning process was applied to the original model cardiffnlp/twitter-roberta-base-sep2022 by:

trainig the original model on a custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users
setting the model's hyperparameters with the values mentioned in the table below

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.0283	1.0	127	1.4553	0.8162
0.9216	2.0	254	0.5951	0.8992
0.4343	3.0	381	0.3544	0.9348
0.2629	4.0	508	0.2613	0.9486
0.1861	5.0	635	0.2433	0.9545

Framework versions

Transformers 4.29.0
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3