sentiment-analysis multi-label-classification sentiment analysis rubert sentiment bert tiny russian multilabel classification emotion-classification emotion-recognition emotion

This is RuBERT-tiny2 model fine-tuned for emotion classification of short Russian texts. The task is a multi-label classification with the following labels:

0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger

Label to Russian label:

no_emotion: нет эмоции
joy: радость
sadness: грусть
surprise: удивление
fear: страх
anger: злость

Usage

from transformers import pipeline
model = pipeline(model="seara/rubert-tiny2-cedr-russian-emotion")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9605025053024292}]

Dataset

This model was trained on CEDR dataset.

An overview of the training data can be found in it's Hugging Face card or in the source article.

Training

Training were done in this project with this parameters:

tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 30

Eval results (on test split)

no_emotion joy sadness surprise fear anger micro avg macro avg weighted avg
precision 0.82 0.84 0.84 0.79 0.78 0.55 0.81 0.77 0.8
recall 0.84 0.83 0.85 0.66 0.67 0.33 0.78 0.7 0.78
f1-score 0.83 0.83 0.84 0.72 0.72 0.41 0.79 0.73 0.79
auc-roc 0.92 0.96 0.96 0.91 0.91 0.77 0.94 0.91 0.93
support 734 353 379 170 141 125 1902 1902 1902