bert-base-uncased
fine-tuned on CoLA dataset, using fine-tuned bert-large-uncased
as a teacher model, torchdistill and Google Colab for knowledge distillation.
The training configuration (including hyperparameters) is available here.
I submitted prediction files to the GLUE leaderboard, and the overall GLUE score was 78.9.
Yoshitomo Matsubara: "torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP" at EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)
[OpenReview] [Preprint]
@article{matsubara2023torchdistill,
title={{torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP}},
author={Matsubara, Yoshitomo},
journal={arXiv preprint arXiv:2310.17644},
year={2023}
}