Model Card for GalXLM-R-sp for Semantic Role Labeling

This model is fine-tuned on a version of XLM RoBERTa Base which is pre-trained on the SRL task for Spanish, and is one of 24 models introduced as part of this project. Prior to this work, there were no published Galician datasets or models for SRL.

Model Details

Model Description

GalXLM-R-sp for Semantic Role Labeling (SRL) is a transformers model, leveraging XLM-R's extensive pretraining on 100 languages to achieve better SRL predictions for low-resource Galician. This model is additionally pre-trained on the SRL task for Spanish. It was fine-tuned on Galician with the following objectives:

Identify up to 13 verbal roots within a sentence.
Identify available arguments for each verbal root. Due to scarcity of data, this model focused solely on the identification of arguments 0, 1, and 2.

Labels are formatted as: r#:tag, where r# links the token to a specific verbal root of index #, and tag identifies the token as the verbal root (root) or an individual argument (arg0/arg1/arg2)

Developed by: Micaella Bruton
Model type: Transformers
Language(s) (NLP): Galician (gl), Spanish (es)
License: Apache 2.0
Finetuned from model: Spanish pre-trained XLM RoBERTa Base

Model Sources

Repository: GalicianSRL
Paper: To be updated

Uses

This model is intended to be used to develop and improve natural language processing tools for Galician.

Bias, Risks, and Limitations

Galician is a low-resource language which prior to this project lacked a semantic role labeling dataset. As such, the dataset used to train this model is extrememly limited and could benefit from the inclusion of additional sentences and manual validation by native speakers.

Training Details

Training Data

This model was pre-trained on the SpanishSRL Dataset produced as part of this same project. This model was fine-tuned on the "train" portion of the GalicianSRL Dataset produced as part of this same project.

Training Hyperparameters

Learning Rate: 2e-5
Batch Size: 16
Weight Decay: 0.01
Early Stopping: 10 epochs

Evaluation

Testing Data

This model was tested on the "test" portion of the GalicianSRL Dataset produced as part of this same project.

Metrics

seqeval is a Python framework for sequence labeling evaluation. It can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, and semantic role labeling. It supplies scoring both overall and per label type.

Overall:

accuracy: the average accuracy, on a scale between 0.0 and 1.0.
precision: the average precision, on a scale between 0.0 and 1.0.
recall: the average recall, on a scale between 0.0 and 1.0.
f1: the average F1 score, which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.

Per label type:

precision: the average precision, on a scale between 0.0 and 1.0.
recall: the average recall, on a scale between 0.0 and 1.0.
f1: the average F1 score, on a scale between 0.0 and 1.0.

Results

Label	Precision	Recall	f1-score	Support
0:arg0	0.80	0.76	0.78	485
0:arg1	0.74	0.75	0.75	483
0:arg2	0.72	0.69	0.70	264
0:root	0.93	0.93	0.93	948
1:arg0	0.72	0.70	0.71	348
1:arg1	0.73	0.68	0.71	443
1:arg2	0.64	0.65	0.64	211
1:root	0.86	0.86	0.86	802
2:arg0	0.62	0.58	0.60	240
2:arg1	0.65	0.62	0.64	331
2:arg2	0.63	0.54	0.59	156
2:root	0.79	0.79	0.79	579
3:arg0	0.50	0.47	0.49	137
3:arg1	0.62	0.54	0.58	216
3:arg2	0.55	0.55	0.55	110
3:root	0.71	0.74	0.72	374
4:arg0	0.50	0.43	0.46	70
4:arg1	0.59	0.59	0.59	109
4:arg2	0.55	0.48	0.52	66
4:root	0.63	0.73	0.68	206
5:arg0	0.57	0.40	0.47	20
5:arg1	0.65	0.49	0.56	57
5:arg2	0.53	0.32	0.40	28
5:root	0.70	0.47	0.56	102
6:arg0	0.60	0.46	0.52	13
6:arg1	0.43	0.36	0.39	25
6:arg2	0.00	0.00	0.00	8
6:root	0.56	0.52	0.54	42
7:arg0	0.00	0.00	0.00	3
7:arg1	0.20	0.12	0.15	8
7:arg2	0.00	0.00	0.00	5
7:root	0.45	0.31	0.37	16
8:arg0	0.50	1.00	0.67	1
8:arg1	0.00	0.00	0.00	2
8:arg2	0.00	0.00	0.00	1
8:root	0.20	0.29	0.24	7
9:arg0	0.00	0.00	0.00	1
9:arg1	0.00	0.00	0.00	2
9:arg2	0.00	0.00	0.00	1
9:root	0.00	0.00	0.00	3
10:arg1	0.00	0.00	0.00	1
10:root	0.00	0.00	0.00	2
micro avg	0.75	0.72	0.73	6926
macro avg	0.45	0.42	0.43	6926
weighted avg	0.74	0.72	0.73	6926
tot root avg	0.53	0.51	0.52	3081
tot A0 avg	0.48	0.48	0.47	1318
tot A1 avg	0.42	0.38	0.40	1677
tot A2 avg	0.36	0.32	0.34	850
tot r0 avg	0.80	0.78	0.79	2180
tot r1 avg	0.74	0.72	0.73	1804
tot r2 avg	0.67	0.63	0.66	1306
tot r3 avg	0.60	0.58	0.59	837
tot r4 avg	0.57	0.56	0.56	451
tot r5 avg	0.61	0.42	0.50	207
tot r6 avg	0.40	0.34	0.36	88
tot r7 avg	0.16	0.11	0.13	32
tot r8 avg	0.18	0.32	0.23	11
tot r9 avg	0.00	0.00	0.00	7
tot r10 avg	0.00	0.00	0.00	3

Citation

BibTeX:

@mastersthesis{bruton-galician-srl-23,
    author = {Bruton, Micaella},
    title = {BERTie Bott's Every Flavor Labels: A Tasty Guide to Developing a Semantic Role Labeling Model for Galician},
    school = {Uppsala University},
    year = {2023},
    type = {Master's thesis},
}