DeBERTa 330M CP on Tranditional Chinese

Dataset

Training Methods

This project builds upon the foundations of the https://huggingface.co/microsoft/deberta-v3-small model. To optimize DeBERTa for Traditional Chinese text, we employed the following training methods:

Token Expansion:

  1. Token Addition: To enrich the model's token representation, we introduced previously unseen tokens to the tokenizer and incorporated their embeddings into the model.
  2. Full Model Refinement: A comprehensive Masked Language Model (MLM) process followed, refining the entire model, including both embeddings and transformer layers.

Span BERT for Token Masking:

We adopted the Span BERT approach for masking tokens during training. Some research suggests that this technique outperforms Whole Word Masking (WWM) in certain scenarios, enhancing the model's performance.

Implementation Details

For our training, we utilized the following configurations:

Performance

DRCD