<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
finetuned-twitter-roberta-base-sep2022-tweetcognition
This model is a fine-tuned version of cardiffnlp/twitter-roberta-base-sep2022 on custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users. It achieves the following results on the evaluation set:
- Loss: 0.2433
- Accuracy: 0.9545
Model description
A RoBERTa-base model trained on 168.86M tweets until the end of September 2022 (15M tweets increment) finetuned and trained on custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users with the scope of performing a specific text xlassification task: classify posts from the Twitter social media platform into a set of 30 distinct classes, each representing a major life event that the author of the post recently experienced. RoBERTa (Robustly Optimized BERT approach) is a state-of-the-art natural language processing (NLP) model developed by Facebook AI.
Intended uses & limitations
The scope of this fine-tuned language model is to be used for a specific text classification task: classify posts from the Twitter social media platform into a set of 30 distinct classes, each representing a major life event that the author of the post recently experienced.
The model can be further improved by training on an even larger training dataset with an extended and more diverse set of life events classes.
Training procedure
A fine-tuning process was applied to the original model cardiffnlp/twitter-roberta-base-sep2022 by:
- trainig the original model on a custom dataset consisting of 2527 recent tweets related to major life events that occur during the lifespan of the users
- setting the model's hyperparameters with the values mentioned in the table below
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.0283 | 1.0 | 127 | 1.4553 | 0.8162 |
0.9216 | 2.0 | 254 | 0.5951 | 0.8992 |
0.4343 | 3.0 | 381 | 0.3544 | 0.9348 |
0.2629 | 4.0 | 508 | 0.2613 | 0.9486 |
0.1861 | 5.0 | 635 | 0.2433 | 0.9545 |
Framework versions
- Transformers 4.29.0
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3