colombian-spanish-cyberbullying-classifier

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset created by manually gathering posts from the social network Twitter to detect cyberbullying in Colombian Spanish.

Training and evaluation data

The dataset used was a small one, consisting of 3570 tweets, which were manually labeled as cyberbullying or not cyberbullying. A distinguishing feature of this dataset is that for a given word, there is an annotated tweet labeled as cyberbullying that contains that word, and another tweet labeled as not cyberbullying with the same word. This is made possible because the context in which the same word is used can vary, leading to tweets being classified differently.

For instance, tweets in the not cyberbullying category predominantly contain obscene words that, in their particular context, do not correspond with cyberbullying. An example is “Marica, se me olvidó ver el partido”. Additionally, the not cyberbullying category, to a lesser extent, includes tweets sourced from trends in the Colombian region. Twitter trends reflect the most popular topics and conversations in a given area at a specific time, essentially capturing what people are discussing and sharing online in that geographical locale.

Trend-based tweets were utilized for those instances where it was not feasible to obtain not cyberbullying tweets containing a specific offensive word or phrase, such as “ojala te violen”. Conversely, tweets labeled as cyberbullying might not always contain words or phrases that are deemed strong or obscene, like in the example “te voy a buscar”.

The distribution of cyberbullying tweets and non-cyberbullying tweets was the same. The keywords and phrases used in the creation of the dataset were selected based on the categories provided in the article Guidelines for the Fine-Grained Analysis of Cyberbullying authored by Cynthia Van Hee, Ben Verhoeven, Els Lefever, Guy De Pauw, Walter Daelemans, and Véronique Hoste. Four categories were included: insult, threat, curse, and defamation. The insult category involves the use of offensive words intended to verbally hurt another person, while threat aims to harm the victim's integrity. Curse includes words that wish harm or misfortune upon a person, and defamation seeks to damage the victim’s reputation. These categories were chosen to capture a broad representation of the forms in which cyberbullying can manifest. The tweets were labeled by an occupational therapist associated with the project.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
weight_decay=0.01
warmup_steps=500
num_epochs: 2

Training results

Epoch	ROC-AUC	Validation Loss	Training Loss
1.0	0.8756	0.4375	---
2.0	0.9022	0.5060	0.4945

</details>

Model in action 🚀

Fast usage with pipelines:

!pip install -q transformers
from transformers import pipeline

model_path = "FelipeGuerra/colombian-spanish-cyberbullying-classifier"
bullying_analysis = pipeline("text-classification", model=model_path, tokenizer=model_path)

bullying_analysis(
    "Como dice mi mamá: va caer palo de agua"
    )

# Output:
[{'label': 'Not_bullying', 'score': 0.977687656879425}]

bullying_analysis(
    "Esta perrita me las va pagar"
    )
# Output:
[{'label': 'Bullying', 'score': 0.9404164552688599}]

Framework versions

Transformers 4.34.0
Pytorch 2.0.1+cu118
Pandas 1.5.3
scikit-learn 1.2.2

Created by Felipe Guerra Sáenz| LinkedIn