deberta deberta-v3 mdeberta korean pretraining

mDeBERTa-v3-base-kor-further

๐Ÿ’ก ์•„๋ž˜ ํ”„๋กœ์ ํŠธ๋Š”ย KPMG Lighthouse Korea์—์„œ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
KPMG Lighthouse Korea์—์„œ๋Š”, Financial area์˜ ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Edge Technology์˜ NLP/Vision AI๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. https://kpmgkr.notion.site/

What is DeBERTa?

How to Use

Pre-trained Models

Further Pretraing Details (MLM Task)

Datasets

Fine-tuning on NLU Tasks - Base Model

Model Size NSMC(acc) Naver NER(F1) PAWS (acc) KorNLI (acc) KorSTS (spearman) Question Pair (acc) KorQuaD (Dev) (EM/F1) Korean-Hate-Speech (Dev) (F1)
XLM-Roberta-Base 1.03G 89.03 86.65 82.80 80.23 78.45 93.80 64.70 / 88.94 64.06
mdeberta-base 534M 90.01 87.43 85.55 80.41 82.65 94.06 65.48 / 89.74 62.91
mdeberta-base-kor-further (Ours) 534M 90.52 87.87 85.85 80.65 81.90 94.98 66.07 / 90.35 68.16

KPMG Lighthouse KR

https://kpmgkr.notion.site/

Citation

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Reference