This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles. For further details, refer to our paper on Journalism: News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments
- This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
- This model was trained on approximately 37,000 user generated comments collected from NAVER's news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
- This model is a finetuned model based on ETRI's KorBERT.
How to use
- The model requires an edited version of the transformers class
BertTokenizer
, which can be found in the fileKorBertTokenizer.py
. - Usage example:
from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch
tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')
def classify(text):
inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')
with torch.no_grad():
logits=model(**inputs).logits
predicted_class_id = logits.argmax().item()
return model.config.id2label[predicted_class_id]
input_strings = ['좌파가 나라 경제 안보 말아먹는다',
'수꼴들은 나라 일본한테 팔아먹었냐']
for input_string in input_strings:
print('===\n입력 텍스트: {}\n분류 결과: {}\n==='.format(input_string, classify(input_string)))
Model performance
- Accuracy: 0.8322
- F1-Score: 0.8322
- For further technical details on the model, refer to our paper for the W-NUT workshop (EMNLP 2019), The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media.