
climateattention-10k classifies if a given sequence is related to climate topics. As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021), it is using the following ClimaText dataset (Varini et al., 2020):

Due to the unbalanced character of the dataset, upscaling has been conducted before training.

How to use:

from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f")
climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k-upscaled',num_labels=2)

ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer)

ClimateAttention('Emissions have increased during the last several months')

>> [{'label': 'Yes', 'score': 0.9993829727172852}]


Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives) (Varini et al., 2020)

Accuracy Precision Recall F1
0.97 0.9531 0.9105 0.9313


Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483.

Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.