[README UNDER CONSTRUCTION]

emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses huBERT tokenizer, and was fine-tuned on a huBERT base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were:

emotion precision recall f1-score
0 - Anger 0.70 0.74 0.72
1 - Disgust 0.72 0.73 0.73
2 - Fear 0.61 0.47 0.53
3 - Happiness 0.38 0.37 0.38
4 - Neutral 0.65 0.62 0.63
5 - Sad 0.74 0.72 0.73
6 - Successful 0.79 0.81 0.80
7 - Trustful 0.76 0.78 0.77
weighted avg 0.73 0.74 0.73
Accuracy reached 74%.

The emotions are based on Plutchik 1980, with anticipation substituted with neutral.

Proper use of the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")

The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.