This is a finetuned DistilBERT model for Vietnamese essay categories classification.
Overview
- At primary levels of education in Vietnam, students are introduced to 5 categories of essays:
- Argumentative - Nghị luận
- Expressive - Biểu cảm
- Descriptive - Miêu tả
- Narrative - Tự sự
- Expository - Thuyết minh
- This model will classify sentences into these 5 categories
Pretrained model used in this pipeline:
- This pipeline includes pre-trained phobert-base and a Multi-label Classification head trained on 8000 manually labeled sample essay sentences.
- The dataset can be found on Kaggle
- Usage of PhoBERT can be found on Huggingface
Citation:
The general architecture and experimental results of PhoBERT can be found in EMNLP-2020 Findings paper:
@article{phobert,
title = {{PhoBERT: Pre-trained language models for Vietnamese}},
author = {Dat Quoc Nguyen and Anh Tuan Nguyen},
journal = {Findings of EMNLP},
year = {2020}
}