Readability ES Sentences for three classes

Model based on the Roberta architecture finetuned on BERTIN for readability assessment of Spanish texts.

Description and performance

This version of the model was trained on a mix of datasets, using sentence-level granularity when possible. The model performs classification among three complexity levels:

Basic.
Intermediate.
Advanced.

The relationship of these categories with the Common European Framework of Reference for Languages is described in our report.

This model achieves a F1 macro average score of 0.6951, measured on the validation set.

Model variants

readability-es-sentences. Two classes, sentence-based dataset.
readability-es-paragraphs. Two classes, paragraph-based dataset.
readability-es-3class-sentences (this model). Three classes, sentence-based dataset.
readability-es-3class-paragraphs. Three classes, paragraph-based dataset.

Datasets

readability-es-hackathon-pln-public, composed of:
- coh-metrix-esp corpus.
- Various text resources scraped from websites.
Other non-public datasets: newsela-es, simplext.

Training details

Please, refer to this training run for full details on hyperparameters and training regime.

Biases and Limitations