PolicyBERTa-7d

This model is a fine-tuned version of roberta-base on data from the Manifesto Project. It was inspired by the model from Laurer (2020).

It achieves the following results on the evaluation set:

Loss: 0.8549
Accuracy: 0.7059
F1-micro: 0.7059
F1-macro: 0.6683
F1-weighted: 0.7033
Precision: 0.7059
Recall: 0.7059

Model description

This model was trained on 115,943 manually annotated sentences to classify text into one of seven political categories: "external relations", "freedom and democracy", "political system", "economy", "welfare and quality of life", "fabric of society" and "social groups".

Intended uses & limitations

The model output reproduces the limitations of the dataset in terms of country coverage, time span, domain definitions and potential biases of the annotators - as any supervised machine learning model would. Applying the model to other types of data (other types of texts, countries etc.) will reduce performance.

from transformers import pipeline
import pandas as pd

classifier = pipeline(
    task="text-classification",
    model="niksmer/PolicyBERTa-7d")

# Load text data you want to classify
text = pd.read_csv("example.csv")["text_you_want_to_classify"].to_list()

# Inference
output = classifier(text)

# Print output
pd.DataFrame(output).head()

Training and evaluation data

PolicyBERTa-7d was trained on the English-speaking subset of the Manifesto Project Dataset (MPDS2021a). The model was trained on 115,943 sentences from 163 political manifestos in 7 English-speaking countries (Australia, Canada, Ireland, New Zealand, South Africa, United Kingdom, United States). The manifestos were published between 1992 - 2020.

Country	Count manifestos	Count sentences	Time span
Australia	18	14,887	2010-2016
Ireland	23	24,966	2007-2016
Canada	14	12,344	2004-2008 & 2015
New Zealand	46	35,079	1993-2017
South Africa	29	13,334	1994-2019
USA	9	13,188	1992 & 2004-2020
United Kingdom	34	30,936	1997-2019

Canadian manifestos between 2004 and 2008 are used as test data.

The Manifesto Project mannually annotates individual sentences from political party manifestos in 7 main political domains: 'Economy', 'External Relations', 'Fabric of Society', 'Freedom and Democracy', 'Political System', 'Welfare and Quality of Life' or 'Social Groups' - see the codebook for the exact definitions of each domain.

Tain data

Train data was higly imbalanced.

Label	Description	Count
0	external relations	7,640
1	freedom and democracy	5,880
2	political system	11,234
3	economy	29,218
4	welfare and quality of life	37,200
5	fabric of society	13,594
6	social groups	11,177

Overall count: 115,943

Validation data

The validation was created by chance.

Label	Description	Count
0	external relations	1,345
1	freedom and democracy	1,043
2	political system	2,038
3	economy	5,140
4	welfare and quality of life	6,554
5	fabric of society	2,384
6	social groups	1,957

Overall count: 20,461

Test data

The test dataset contains ten canadian manifestos between 2004 and 2008.

Label	Description	Count
0	external relations	824
1	freedom and democracy	296
2	political system	1,041
3	economy	2,188
4	welfare and quality of life	2,654
5	fabric of society	940
6	social groups	387

Overall count: 8,330

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

training_args = TrainingArguments(
    warmup_steps=0,
    weight_decay=0.1, 
    learning_rate=1e-05,
    fp16 = True,
    evaluation_strategy="epoch",
    num_train_epochs=5,
    per_device_train_batch_size=16,
    overwrite_output_dir=True,
    per_device_eval_batch_size=16,
    save_strategy="no",
    logging_dir='logs',   
    logging_strategy= 'steps',     
    logging_steps=10,
    push_to_hub=True,
    hub_strategy="end")

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1-micro	F1-macro	F1-weighted	Precision	Recall
0.9154	1.0	1812	0.8984	0.6785	0.6785	0.6383	0.6772	0.6785	0.6785
0.8374	2.0	3624	0.8569	0.6957	0.6957	0.6529	0.6914	0.6957	0.6957
0.7053	3.0	5436	0.8582	0.7019	0.7019	0.6594	0.6967	0.7019	0.7019
0.7178	4.0	7248	0.8488	0.7030	0.7030	0.6662	0.7011	0.7030	0.7030
0.6688	5.0	9060	0.8549	0.7059	0.7059	0.6683	0.7033	0.7059	0.7059

Validation evaluation

Model	Micro F1-Score	Macro F1-Score	Weighted F1-Score
PolicyBERTa-7d	0.71	0.67	0.70

Test evaluation

Model	Micro F1-Score	Macro F1-Score	Weighted F1-Score
PolicyBERTa-7d	0.65	0.60	0.65

Evaluation per category

Label	Validation F1-Score	Test F1-Score
external relations	0.76	0.70
freedom and democracy	0.61	0.55
political system	0.55	0.55
economy	0.74	0.67
welfare and quality of life	0.77	0.72
fabric of society	0.67	0.60
social groups	0.58	0.41

Evaluation based on saliency theory

Saliency theory is a theory to analyse politial text data. In sum, parties tend to write about policies in which they think that they are seen as competent. Voters tend to assign advantages in policy competence in line to the assumed ideology of parties. Therefore you can analyze the share of policies parties tend to write about in their manifestos to analyze the party ideology.

The Manifesto Project presented for such an analysis the rile-index. For a quick overview, check this. But PolicyBERTa isn't fine-tuned to predict the rile-index, if you're interested in that, check ManiBERT or RoBERTa-RILE.

In the following table, the predicted and original share of the individual policy domains are shown per manifesto in the test dataset. Overall the pearson correlation between the predicted and original shares is 0.965.

Party-ID	Year	Type	Share external relations	Share freedom and democracy	Share political system	Share economy	Share welfare and quality of life	Share fabric of society	Share social groups
62320	2004	Predicted	7.1%	4.8%	13.2%	20.3%	35.2%	9.6%	9.8%
		Original	10.2%	2.5%	13.7%	23.8%	31.7%	11.6%	6.4%
62320	2006	Predicted	2.9%	4.7%	16.4%	18.9%	38.3%	11.9%	6.9%
		Original	5.6%	5.0%	15.8%	20.7%	38.7%	9.3%	4.9%
62320	2008	Predicted	6.8%	4.7%	6.2%	24.7%	38.3%	10.3%	9.0%
		Original	5.6%	3.7%	8.2%	33.1%	29.5%	11.7%	4.3%
62420	2004	Predicted	9.7%	3.5%	14.5%	24.7%	34.8%	8.5%	4.3%
		Original	12.6%	1.3%	18.8%	23.0%	33.2%	9.0%	2.0%
62420	2006	Predicted	9.5%	2.2%	7.9%	27.8%	34.8%	9.2%	8.7%
		Original	10.6%	2.5%	9.6%	29.7%	33.1%	8.3%	6.2%
62420	2008	Predicted	0.7%	0.5%	3.5%	41.7%	46.4%	3.7%	3.5%
		Original	2.0%	0.2%	4.4%	33.3%	45.9%	7.7%	6.4%
62623	2004	Predicted	7.1%	11.4%	24.5%	17.6%	21.5%	13.6%	4.3%
		Original	8.4%	6.7%	28.8%	17.4%	18.7%	15.5%	4.5%
62623	2006	Predicted	5.6%	8.5%	23.6%	15.6%	14.8%	24.3%	7.6%
		Original	5.0%	8.9%	22.2%	17.4%	17.2%	25.7%	3.6%
62623	2008	Predicted	5.0%	4.4%	12.2%	33.1%	21.9%	17.5%	5.9%
		Original	5.6%	2.2%	11.6%	37.8%	17.8%	20.9%	4.1%
62110	2008	Predicted	10.0%	3.1%	6.8%	22.7%	41.3%	10.1%	6.0%
		Original	13.4%	3.3%	7.7%	26.9%	35.6%	8.9%	4.3%

Framework versions

Transformers 4.16.2
Pytorch 1.9.0+cu102
Datasets 1.8.0
Tokenizers 0.10.3