RuBERTConv Toxic Classifier
Model description
Based on rubert-base-cased-conversational model
Intended uses & limitations
How to use
Colab: link
from transformers import pipeline
model_name = "IlyaGusev/rubertconv_toxic_clf"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name, framework="pt")
text = "Ты придурок из интернета"
pipe([text])
Training data
Datasets:
Augmentations:
- ё -> е
- Remove or add "?" or "!"
- Fix CAPS
- Concatenate toxic and non-toxic texts
- Concatenate two non-toxic texts
- Add toxic words from vocabulary
- Add typos
- Mask toxic words with "*", "@", "$"
Training procedure
TBA