Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection

This is a Polish version of BERT language model, specifically, Polbert, trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.

Fine-tuning dataset

The dataset used for fine-tuning this model was based on the original Dataset for Automatic Cyberbullying Detection in Polish Laguage, which was recently additionally cleaned and re-annotated by experts from Samurai Labs. The improved dataset and will be released separately later.

Acknowledgements

Author

Michal Ptaszynski - contact me on:

Licences

The finetuned model with all attached files is licensed under CC BY-SA 4.0, or Creative Commons Attribution-ShareAlike 4.0 International License.

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>

Citations

Please, cite this model using the following citation.

Model:

@article{ptaszynski2022cyberbullyibng-bert-pl,
  title={Polish BERT trained for Automatic Cyberbullying Detection},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  year={2022},
  publisher={HuggingFace},
  url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
}

Original dataset:

@article{ptaszynski2019results,
  title={Results of the poleval 2019 shared task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dyba{\l}a, Pawe{\l}},
  year={2019},
  publisher={Warszawa: Institute of Computer Sciences. Polish Academy of Sciences}
}

Improved dataset:

TBA

References