Vietnamese DebertaV3 Large (vi-deberta-v3-large)
Todo
[x] Corpora collection
[x] Tokenizer training
[x] Model pretraining
[ ] Model finetuning
[ ] Experimental results, comparision, and conclusion
Model Info
LAYER NAME #PARAMS RATIO MEM(MB)
--model: 851,542,017 100.00% 3248.38
--generator: 284,459,008 33.41% 1085.12
--deberta: 283,279,360 33.27% 1080.62
--lm_predictions: 1,179,648 0.14% 4.50
--discriminator: 567,083,009 66.59% 2163.25
--deberta: 566,030,336 66.47% 2159.23
--mask_predictions: 1,052,673 0.12% 4.02
Model Perfomance
Metric | Value |
---|---|
accuracy | 0.7113977778702509 |
eval_loss | 1.3216993808746338 |
eval_metric | 0.7113977778702509 |
eval_samples | 240310 |
perplexity | 3.749788284301758 |
best_metric | 0.7113977778702509@2200000 |
train_steps | 2200000 |
train_loss | 1.1960319969044906 |
TL;DR
Aspect | Sub-Aspect | Description |
---|---|---|
Corpus | Language | Vietnames |
Source | Wiki 2023 (1GB), News 2023 (17.2GB), News (64GB) | |
Size | 1GB, 18GB, 64GB | |
Preprocesing | None | |
Tokenizer | Lib | SentencePiece |
Algorithm | BPE | |
Type | spm | |
Vocab | 128000 | |
Ref | https://github.com/google/sentencepiece | |
Model | Type | DeBERTaV3 |
Ref | https://openreview.net/forum?id=sE7-XhLxHA | |
Code | https://github.com/microsoft/DeBERTa | |
Pretraining | Task | RTD |
Config | model_config.json | |
Args | default | |
Hardware | 5x Nvidia A100-SXM4-80G, 2x Nvidia 4090-PCI-24GB | |
Phases | Init, Refining, Enlarging | |
Status | Training on hold step 2200000 | |
Finetuning | Status | Not started (need help) |
Repo
📁vi-deberta-v3-large
|---🗎config.json
|---🗎pytorch_model.bin
|---🗎spm.model
|---tl;dr.pdf
|---📁discriminator
|---📁generator
|---📁tokenizer
|---📁metrics
|---📁logs
Pretraining
<!-- START-OF PHASE 0 --> <details> <summary>Phase 0: Init</summary>
Info
- Goal: Init
- Progress: 30.00% ▓▓▓▓░░░░░░
- Status: training interrupped, step 1000000
- Loss: <a href="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/raw/main/logs/loss.html" download>init loss</a>
Metrics
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses.png">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities.png" width="648">
</details> <!-- END-OF PHASE 0 -->
<!-- START-OF PHASE 1 --> <details> <summary>Phase 1: Refining</summary>
Info
- Goal: refining
- Changes: smaller learning rate (100µ -> 2µ)
- Progress: 100.00% ▓▓▓▓▓▓▓▓▓▓
- Status: training finished, step 1500000
- Loss: <a href="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/raw/main/logs/loss_refining.html" download>refining loss</a>
Metrics
Metric | Value |
---|---|
accuracy | 0.7515653334245732 |
eval_loss | 1.0692176818847656 |
eval_metric | 0.7515653334245732 |
eval_samples | 29227 |
perplexity | 2.913099527359009 |
best_metric | 0.7522154172511719@1450000 |
train_steps | 1500000 |
train_loss | 1.1779516744723688 |
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_refining.png">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_refining.png" width="648">
</details> <!-- END-OF PHASE 1 -->
<!-- START-OF PHASE 2 --> <details> <summary>Phase 2: Enlarging</summary>
Info
- Goal: enlarging, augmenting, expanding data
- Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310)
- Progress: 20.00% ▓▓░░░░░░░░
- Status: training in progress, step 2000000
- Loss: <a href="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/raw/main/logs/loss_enlarging.html" download>enlarging loss</a>
Metrics
Metric | Value |
---|---|
accuracy | 0.7084723898298032 |
eval_loss | 1.3221531009674072 |
eval_metric | 0.7084723898298032 |
eval_samples | 240310 |
perplexity | 3.7141621112823486 |
best_metric | 0.7084723898298032@2000000 |
train_steps | 2000000 |
train_loss | 1.2167873119241372 |
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_enlarging.png">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_enlarging.png" width="648">
</details> <!-- END-OF PHASE 2 -->
Phase 2: Enlarging (resume, on hold)
- Goal: enlarging, augmenting, expanding data
- Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310)
- Progress: 20.00% ▓▓░░░░░░░░
- Status: training phase 2 interrupted intentionally, step 2200000
- Loss: <a href="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/raw/main/logs/loss_enlarging_2.html" download>enlarging loss</a>
- Metrics:
Metric | Value |
---|---|
accuracy | 0.7113977778702509 |
eval_loss | 1.3216993808746338 |
eval_metric | 0.7113977778702509 |
eval_samples | 240310 |
perplexity | 3.749788284301758 |
best_metric | 0.7113977778702509@2200000 |
train_steps | 2200000 |
train_loss | 1.1960319969044906 |
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_enlarging_2.png">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_enlarging_2.png" width="648">
Finetuning NEED HELP
- Token Classification: tasks, datasets
- Sequence Classification: tasks, datasets
Experimental Results and Comparision
- Not started
Usage
Method 1: Load pretrained vi-deberta-v3-large with Transformers AutoClass
-
Install ClickAI
# Why need to install clickai? # HF Transformers has not yet supported DebertaV3. # ClickAI: locally register DeBertaV3 with HF Transformers.
pip install git+https://gitlab.com/anhdungvo/clickai.git
-
Tokenizer, Config, Model
import clickai from transformers import AutoTokenizer, AutoConfig, AutoModel config = AutoConfig.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer = AutoTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large") model = AutoModel.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer("Xử lý ngôn ngữ tiếng Việt", return_tensors='pt')
-
Transfer Pretrained Model Weights
your_model.load_state_dict(model.GET_NEEDED_MODULE_WEIGHTS())
<!-- START-OF MODEL STAGE DICT KEYS --> <details> <summary>Model state dict keys</summary>
model.state_dict().keys()
[
generator.deberta.embeddings.word_embeddings.weight
generator.deberta.embeddings.position_embeddings.weight
generator.deberta.embeddings.LayerNorm.weight
generator.deberta.embeddings.LayerNorm.bias
generator.deberta.encoder.layer.0.attention.self.query_proj.weight
generator.deberta.encoder.layer.0.attention.self.query_proj.bias
generator.deberta.encoder.layer.0.attention.self.key_proj.weight
generator.deberta.encoder.layer.0.attention.self.key_proj.bias
generator.deberta.encoder.layer.0.attention.self.value_proj.weight
generator.deberta.encoder.layer.0.attention.self.value_proj.bias
generator.deberta.encoder.layer.0.attention.output.dense.weight
generator.deberta.encoder.layer.0.attention.output.dense.bias
generator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.0.intermediate.dense.weight
generator.deberta.encoder.layer.0.intermediate.dense.bias
generator.deberta.encoder.layer.0.output.dense.weight
generator.deberta.encoder.layer.0.output.dense.bias
generator.deberta.encoder.layer.0.output.LayerNorm.weight
generator.deberta.encoder.layer.0.output.LayerNorm.bias
generator.deberta.encoder.layer.1.attention.self.query_proj.weight
generator.deberta.encoder.layer.1.attention.self.query_proj.bias
generator.deberta.encoder.layer.1.attention.self.key_proj.weight
generator.deberta.encoder.layer.1.attention.self.key_proj.bias
generator.deberta.encoder.layer.1.attention.self.value_proj.weight
generator.deberta.encoder.layer.1.attention.self.value_proj.bias
generator.deberta.encoder.layer.1.attention.output.dense.weight
generator.deberta.encoder.layer.1.attention.output.dense.bias
generator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.1.intermediate.dense.weight
generator.deberta.encoder.layer.1.intermediate.dense.bias
generator.deberta.encoder.layer.1.output.dense.weight
generator.deberta.encoder.layer.1.output.dense.bias
generator.deberta.encoder.layer.1.output.LayerNorm.weight
generator.deberta.encoder.layer.1.output.LayerNorm.bias
generator.deberta.encoder.layer.2.attention.self.query_proj.weight
generator.deberta.encoder.layer.2.attention.self.query_proj.bias
generator.deberta.encoder.layer.2.attention.self.key_proj.weight
generator.deberta.encoder.layer.2.attention.self.key_proj.bias
generator.deberta.encoder.layer.2.attention.self.value_proj.weight
generator.deberta.encoder.layer.2.attention.self.value_proj.bias
generator.deberta.encoder.layer.2.attention.output.dense.weight
generator.deberta.encoder.layer.2.attention.output.dense.bias
generator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.2.intermediate.dense.weight
generator.deberta.encoder.layer.2.intermediate.dense.bias
generator.deberta.encoder.layer.2.output.dense.weight
generator.deberta.encoder.layer.2.output.dense.bias
generator.deberta.encoder.layer.2.output.LayerNorm.weight
generator.deberta.encoder.layer.2.output.LayerNorm.bias
generator.deberta.encoder.layer.3.attention.self.query_proj.weight
generator.deberta.encoder.layer.3.attention.self.query_proj.bias
generator.deberta.encoder.layer.3.attention.self.key_proj.weight
generator.deberta.encoder.layer.3.attention.self.key_proj.bias
generator.deberta.encoder.layer.3.attention.self.value_proj.weight
generator.deberta.encoder.layer.3.attention.self.value_proj.bias
generator.deberta.encoder.layer.3.attention.output.dense.weight
generator.deberta.encoder.layer.3.attention.output.dense.bias
generator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.3.intermediate.dense.weight
generator.deberta.encoder.layer.3.intermediate.dense.bias
generator.deberta.encoder.layer.3.output.dense.weight
generator.deberta.encoder.layer.3.output.dense.bias
generator.deberta.encoder.layer.3.output.LayerNorm.weight
generator.deberta.encoder.layer.3.output.LayerNorm.bias
generator.deberta.encoder.layer.4.attention.self.query_proj.weight
generator.deberta.encoder.layer.4.attention.self.query_proj.bias
generator.deberta.encoder.layer.4.attention.self.key_proj.weight
generator.deberta.encoder.layer.4.attention.self.key_proj.bias
generator.deberta.encoder.layer.4.attention.self.value_proj.weight
generator.deberta.encoder.layer.4.attention.self.value_proj.bias
generator.deberta.encoder.layer.4.attention.output.dense.weight
generator.deberta.encoder.layer.4.attention.output.dense.bias
generator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.4.intermediate.dense.weight
generator.deberta.encoder.layer.4.intermediate.dense.bias
generator.deberta.encoder.layer.4.output.dense.weight
generator.deberta.encoder.layer.4.output.dense.bias
generator.deberta.encoder.layer.4.output.LayerNorm.weight
generator.deberta.encoder.layer.4.output.LayerNorm.bias
generator.deberta.encoder.layer.5.attention.self.query_proj.weight
generator.deberta.encoder.layer.5.attention.self.query_proj.bias
generator.deberta.encoder.layer.5.attention.self.key_proj.weight
generator.deberta.encoder.layer.5.attention.self.key_proj.bias
generator.deberta.encoder.layer.5.attention.self.value_proj.weight
generator.deberta.encoder.layer.5.attention.self.value_proj.bias
generator.deberta.encoder.layer.5.attention.output.dense.weight
generator.deberta.encoder.layer.5.attention.output.dense.bias
generator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.5.intermediate.dense.weight
generator.deberta.encoder.layer.5.intermediate.dense.bias
generator.deberta.encoder.layer.5.output.dense.weight
generator.deberta.encoder.layer.5.output.dense.bias
generator.deberta.encoder.layer.5.output.LayerNorm.weight
generator.deberta.encoder.layer.5.output.LayerNorm.bias
generator.deberta.encoder.layer.6.attention.self.query_proj.weight
generator.deberta.encoder.layer.6.attention.self.query_proj.bias
generator.deberta.encoder.layer.6.attention.self.key_proj.weight
generator.deberta.encoder.layer.6.attention.self.key_proj.bias
generator.deberta.encoder.layer.6.attention.self.value_proj.weight
generator.deberta.encoder.layer.6.attention.self.value_proj.bias
generator.deberta.encoder.layer.6.attention.output.dense.weight
generator.deberta.encoder.layer.6.attention.output.dense.bias
generator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.6.intermediate.dense.weight
generator.deberta.encoder.layer.6.intermediate.dense.bias
generator.deberta.encoder.layer.6.output.dense.weight
generator.deberta.encoder.layer.6.output.dense.bias
generator.deberta.encoder.layer.6.output.LayerNorm.weight
generator.deberta.encoder.layer.6.output.LayerNorm.bias
generator.deberta.encoder.layer.7.attention.self.query_proj.weight
generator.deberta.encoder.layer.7.attention.self.query_proj.bias
generator.deberta.encoder.layer.7.attention.self.key_proj.weight
generator.deberta.encoder.layer.7.attention.self.key_proj.bias
generator.deberta.encoder.layer.7.attention.self.value_proj.weight
generator.deberta.encoder.layer.7.attention.self.value_proj.bias
generator.deberta.encoder.layer.7.attention.output.dense.weight
generator.deberta.encoder.layer.7.attention.output.dense.bias
generator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.7.intermediate.dense.weight
generator.deberta.encoder.layer.7.intermediate.dense.bias
generator.deberta.encoder.layer.7.output.dense.weight
generator.deberta.encoder.layer.7.output.dense.bias
generator.deberta.encoder.layer.7.output.LayerNorm.weight
generator.deberta.encoder.layer.7.output.LayerNorm.bias
generator.deberta.encoder.layer.8.attention.self.query_proj.weight
generator.deberta.encoder.layer.8.attention.self.query_proj.bias
generator.deberta.encoder.layer.8.attention.self.key_proj.weight
generator.deberta.encoder.layer.8.attention.self.key_proj.bias
generator.deberta.encoder.layer.8.attention.self.value_proj.weight
generator.deberta.encoder.layer.8.attention.self.value_proj.bias
generator.deberta.encoder.layer.8.attention.output.dense.weight
generator.deberta.encoder.layer.8.attention.output.dense.bias
generator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.8.intermediate.dense.weight
generator.deberta.encoder.layer.8.intermediate.dense.bias
generator.deberta.encoder.layer.8.output.dense.weight
generator.deberta.encoder.layer.8.output.dense.bias
generator.deberta.encoder.layer.8.output.LayerNorm.weight
generator.deberta.encoder.layer.8.output.LayerNorm.bias
generator.deberta.encoder.layer.9.attention.self.query_proj.weight
generator.deberta.encoder.layer.9.attention.self.query_proj.bias
generator.deberta.encoder.layer.9.attention.self.key_proj.weight
generator.deberta.encoder.layer.9.attention.self.key_proj.bias
generator.deberta.encoder.layer.9.attention.self.value_proj.weight
generator.deberta.encoder.layer.9.attention.self.value_proj.bias
generator.deberta.encoder.layer.9.attention.output.dense.weight
generator.deberta.encoder.layer.9.attention.output.dense.bias
generator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.9.intermediate.dense.weight
generator.deberta.encoder.layer.9.intermediate.dense.bias
generator.deberta.encoder.layer.9.output.dense.weight
generator.deberta.encoder.layer.9.output.dense.bias
generator.deberta.encoder.layer.9.output.LayerNorm.weight
generator.deberta.encoder.layer.9.output.LayerNorm.bias
generator.deberta.encoder.layer.10.attention.self.query_proj.weight
generator.deberta.encoder.layer.10.attention.self.query_proj.bias
generator.deberta.encoder.layer.10.attention.self.key_proj.weight
generator.deberta.encoder.layer.10.attention.self.key_proj.bias
generator.deberta.encoder.layer.10.attention.self.value_proj.weight
generator.deberta.encoder.layer.10.attention.self.value_proj.bias
generator.deberta.encoder.layer.10.attention.output.dense.weight
generator.deberta.encoder.layer.10.attention.output.dense.bias
generator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.10.intermediate.dense.weight
generator.deberta.encoder.layer.10.intermediate.dense.bias
generator.deberta.encoder.layer.10.output.dense.weight
generator.deberta.encoder.layer.10.output.dense.bias
generator.deberta.encoder.layer.10.output.LayerNorm.weight
generator.deberta.encoder.layer.10.output.LayerNorm.bias
generator.deberta.encoder.layer.11.attention.self.query_proj.weight
generator.deberta.encoder.layer.11.attention.self.query_proj.bias
generator.deberta.encoder.layer.11.attention.self.key_proj.weight
generator.deberta.encoder.layer.11.attention.self.key_proj.bias
generator.deberta.encoder.layer.11.attention.self.value_proj.weight
generator.deberta.encoder.layer.11.attention.self.value_proj.bias
generator.deberta.encoder.layer.11.attention.output.dense.weight
generator.deberta.encoder.layer.11.attention.output.dense.bias
generator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.11.intermediate.dense.weight
generator.deberta.encoder.layer.11.intermediate.dense.bias
generator.deberta.encoder.layer.11.output.dense.weight
generator.deberta.encoder.layer.11.output.dense.bias
generator.deberta.encoder.layer.11.output.LayerNorm.weight
generator.deberta.encoder.layer.11.output.LayerNorm.bias
generator.deberta.encoder.rel_embeddings.weight
generator.deberta.encoder.LayerNorm.weight
generator.deberta.encoder.LayerNorm.bias
generator.lm_predictions.lm_head.bias
generator.lm_predictions.lm_head.dense.weight
generator.lm_predictions.lm_head.dense.bias
generator.lm_predictions.lm_head.LayerNorm.weight
generator.lm_predictions.lm_head.LayerNorm.bias
discriminator.deberta.embeddings.word_embeddings.weight
discriminator.deberta.embeddings.word_embeddings._weight
discriminator.deberta.embeddings.position_embeddings.weight
discriminator.deberta.embeddings.position_embeddings._weight
discriminator.deberta.embeddings.LayerNorm.weight
discriminator.deberta.embeddings.LayerNorm.bias
discriminator.deberta.encoder.layer.0.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.0.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.0.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.0.attention.output.dense.weight
discriminator.deberta.encoder.layer.0.attention.output.dense.bias
discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.0.intermediate.dense.weight
discriminator.deberta.encoder.layer.0.intermediate.dense.bias
discriminator.deberta.encoder.layer.0.output.dense.weight
discriminator.deberta.encoder.layer.0.output.dense.bias
discriminator.deberta.encoder.layer.0.output.LayerNorm.weight
discriminator.deberta.encoder.layer.0.output.LayerNorm.bias
discriminator.deberta.encoder.layer.1.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.1.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.1.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.1.attention.output.dense.weight
discriminator.deberta.encoder.layer.1.attention.output.dense.bias
discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.1.intermediate.dense.weight
discriminator.deberta.encoder.layer.1.intermediate.dense.bias
discriminator.deberta.encoder.layer.1.output.dense.weight
discriminator.deberta.encoder.layer.1.output.dense.bias
discriminator.deberta.encoder.layer.1.output.LayerNorm.weight
discriminator.deberta.encoder.layer.1.output.LayerNorm.bias
discriminator.deberta.encoder.layer.2.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.2.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.2.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.2.attention.output.dense.weight
discriminator.deberta.encoder.layer.2.attention.output.dense.bias
discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.2.intermediate.dense.weight
discriminator.deberta.encoder.layer.2.intermediate.dense.bias
discriminator.deberta.encoder.layer.2.output.dense.weight
discriminator.deberta.encoder.layer.2.output.dense.bias
discriminator.deberta.encoder.layer.2.output.LayerNorm.weight
discriminator.deberta.encoder.layer.2.output.LayerNorm.bias
discriminator.deberta.encoder.layer.3.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.3.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.3.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.3.attention.output.dense.weight
discriminator.deberta.encoder.layer.3.attention.output.dense.bias
discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.3.intermediate.dense.weight
discriminator.deberta.encoder.layer.3.intermediate.dense.bias
discriminator.deberta.encoder.layer.3.output.dense.weight
discriminator.deberta.encoder.layer.3.output.dense.bias
discriminator.deberta.encoder.layer.3.output.LayerNorm.weight
discriminator.deberta.encoder.layer.3.output.LayerNorm.bias
discriminator.deberta.encoder.layer.4.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.4.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.4.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.4.attention.output.dense.weight
discriminator.deberta.encoder.layer.4.attention.output.dense.bias
discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.4.intermediate.dense.weight
discriminator.deberta.encoder.layer.4.intermediate.dense.bias
discriminator.deberta.encoder.layer.4.output.dense.weight
discriminator.deberta.encoder.layer.4.output.dense.bias
discriminator.deberta.encoder.layer.4.output.LayerNorm.weight
discriminator.deberta.encoder.layer.4.output.LayerNorm.bias
discriminator.deberta.encoder.layer.5.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.5.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.5.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.5.attention.output.dense.weight
discriminator.deberta.encoder.layer.5.attention.output.dense.bias
discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.5.intermediate.dense.weight
discriminator.deberta.encoder.layer.5.intermediate.dense.bias
discriminator.deberta.encoder.layer.5.output.dense.weight
discriminator.deberta.encoder.layer.5.output.dense.bias
discriminator.deberta.encoder.layer.5.output.LayerNorm.weight
discriminator.deberta.encoder.layer.5.output.LayerNorm.bias
discriminator.deberta.encoder.layer.6.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.6.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.6.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.6.attention.output.dense.weight
discriminator.deberta.encoder.layer.6.attention.output.dense.bias
discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.6.intermediate.dense.weight
discriminator.deberta.encoder.layer.6.intermediate.dense.bias
discriminator.deberta.encoder.layer.6.output.dense.weight
discriminator.deberta.encoder.layer.6.output.dense.bias
discriminator.deberta.encoder.layer.6.output.LayerNorm.weight
discriminator.deberta.encoder.layer.6.output.LayerNorm.bias
discriminator.deberta.encoder.layer.7.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.7.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.7.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.7.attention.output.dense.weight
discriminator.deberta.encoder.layer.7.attention.output.dense.bias
discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.7.intermediate.dense.weight
discriminator.deberta.encoder.layer.7.intermediate.dense.bias
discriminator.deberta.encoder.layer.7.output.dense.weight
discriminator.deberta.encoder.layer.7.output.dense.bias
discriminator.deberta.encoder.layer.7.output.LayerNorm.weight
discriminator.deberta.encoder.layer.7.output.LayerNorm.bias
discriminator.deberta.encoder.layer.8.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.8.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.8.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.8.attention.output.dense.weight
discriminator.deberta.encoder.layer.8.attention.output.dense.bias
discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.8.intermediate.dense.weight
discriminator.deberta.encoder.layer.8.intermediate.dense.bias
discriminator.deberta.encoder.layer.8.output.dense.weight
discriminator.deberta.encoder.layer.8.output.dense.bias
discriminator.deberta.encoder.layer.8.output.LayerNorm.weight
discriminator.deberta.encoder.layer.8.output.LayerNorm.bias
discriminator.deberta.encoder.layer.9.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.9.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.9.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.9.attention.output.dense.weight
discriminator.deberta.encoder.layer.9.attention.output.dense.bias
discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.9.intermediate.dense.weight
discriminator.deberta.encoder.layer.9.intermediate.dense.bias
discriminator.deberta.encoder.layer.9.output.dense.weight
discriminator.deberta.encoder.layer.9.output.dense.bias
discriminator.deberta.encoder.layer.9.output.LayerNorm.weight
discriminator.deberta.encoder.layer.9.output.LayerNorm.bias
discriminator.deberta.encoder.layer.10.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.10.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.10.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.10.attention.output.dense.weight
discriminator.deberta.encoder.layer.10.attention.output.dense.bias
discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.10.intermediate.dense.weight
discriminator.deberta.encoder.layer.10.intermediate.dense.bias
discriminator.deberta.encoder.layer.10.output.dense.weight
discriminator.deberta.encoder.layer.10.output.dense.bias
discriminator.deberta.encoder.layer.10.output.LayerNorm.weight
discriminator.deberta.encoder.layer.10.output.LayerNorm.bias
discriminator.deberta.encoder.layer.11.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.11.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.11.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.11.attention.output.dense.weight
discriminator.deberta.encoder.layer.11.attention.output.dense.bias
discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.11.intermediate.dense.weight
discriminator.deberta.encoder.layer.11.intermediate.dense.bias
discriminator.deberta.encoder.layer.11.output.dense.weight
discriminator.deberta.encoder.layer.11.output.dense.bias
discriminator.deberta.encoder.layer.11.output.LayerNorm.weight
discriminator.deberta.encoder.layer.11.output.LayerNorm.bias
discriminator.deberta.encoder.layer.12.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.12.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.12.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.12.attention.output.dense.weight
discriminator.deberta.encoder.layer.12.attention.output.dense.bias
discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.12.intermediate.dense.weight
discriminator.deberta.encoder.layer.12.intermediate.dense.bias
discriminator.deberta.encoder.layer.12.output.dense.weight
discriminator.deberta.encoder.layer.12.output.dense.bias
discriminator.deberta.encoder.layer.12.output.LayerNorm.weight
discriminator.deberta.encoder.layer.12.output.LayerNorm.bias
discriminator.deberta.encoder.layer.13.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.13.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.13.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.13.attention.output.dense.weight
discriminator.deberta.encoder.layer.13.attention.output.dense.bias
discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.13.intermediate.dense.weight
discriminator.deberta.encoder.layer.13.intermediate.dense.bias
discriminator.deberta.encoder.layer.13.output.dense.weight
discriminator.deberta.encoder.layer.13.output.dense.bias
discriminator.deberta.encoder.layer.13.output.LayerNorm.weight
discriminator.deberta.encoder.layer.13.output.LayerNorm.bias
discriminator.deberta.encoder.layer.14.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.14.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.14.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.14.attention.output.dense.weight
discriminator.deberta.encoder.layer.14.attention.output.dense.bias
discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.14.intermediate.dense.weight
discriminator.deberta.encoder.layer.14.intermediate.dense.bias
discriminator.deberta.encoder.layer.14.output.dense.weight
discriminator.deberta.encoder.layer.14.output.dense.bias
discriminator.deberta.encoder.layer.14.output.LayerNorm.weight
discriminator.deberta.encoder.layer.14.output.LayerNorm.bias
discriminator.deberta.encoder.layer.15.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.15.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.15.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.15.attention.output.dense.weight
discriminator.deberta.encoder.layer.15.attention.output.dense.bias
discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.15.intermediate.dense.weight
discriminator.deberta.encoder.layer.15.intermediate.dense.bias
discriminator.deberta.encoder.layer.15.output.dense.weight
discriminator.deberta.encoder.layer.15.output.dense.bias
discriminator.deberta.encoder.layer.15.output.LayerNorm.weight
discriminator.deberta.encoder.layer.15.output.LayerNorm.bias
discriminator.deberta.encoder.layer.16.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.16.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.16.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.16.attention.output.dense.weight
discriminator.deberta.encoder.layer.16.attention.output.dense.bias
discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.16.intermediate.dense.weight
discriminator.deberta.encoder.layer.16.intermediate.dense.bias
discriminator.deberta.encoder.layer.16.output.dense.weight
discriminator.deberta.encoder.layer.16.output.dense.bias
discriminator.deberta.encoder.layer.16.output.LayerNorm.weight
discriminator.deberta.encoder.layer.16.output.LayerNorm.bias
discriminator.deberta.encoder.layer.17.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.17.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.17.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.17.attention.output.dense.weight
discriminator.deberta.encoder.layer.17.attention.output.dense.bias
discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.17.intermediate.dense.weight
discriminator.deberta.encoder.layer.17.intermediate.dense.bias
discriminator.deberta.encoder.layer.17.output.dense.weight
discriminator.deberta.encoder.layer.17.output.dense.bias
discriminator.deberta.encoder.layer.17.output.LayerNorm.weight
discriminator.deberta.encoder.layer.17.output.LayerNorm.bias
discriminator.deberta.encoder.layer.18.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.18.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.18.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.18.attention.output.dense.weight
discriminator.deberta.encoder.layer.18.attention.output.dense.bias
discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.18.intermediate.dense.weight
discriminator.deberta.encoder.layer.18.intermediate.dense.bias
discriminator.deberta.encoder.layer.18.output.dense.weight
discriminator.deberta.encoder.layer.18.output.dense.bias
discriminator.deberta.encoder.layer.18.output.LayerNorm.weight
discriminator.deberta.encoder.layer.18.output.LayerNorm.bias
discriminator.deberta.encoder.layer.19.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.19.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.19.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.19.attention.output.dense.weight
discriminator.deberta.encoder.layer.19.attention.output.dense.bias
discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.19.intermediate.dense.weight
discriminator.deberta.encoder.layer.19.intermediate.dense.bias
discriminator.deberta.encoder.layer.19.output.dense.weight
discriminator.deberta.encoder.layer.19.output.dense.bias
discriminator.deberta.encoder.layer.19.output.LayerNorm.weight
discriminator.deberta.encoder.layer.19.output.LayerNorm.bias
discriminator.deberta.encoder.layer.20.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.20.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.20.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.20.attention.output.dense.weight
discriminator.deberta.encoder.layer.20.attention.output.dense.bias
discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.20.intermediate.dense.weight
discriminator.deberta.encoder.layer.20.intermediate.dense.bias
discriminator.deberta.encoder.layer.20.output.dense.weight
discriminator.deberta.encoder.layer.20.output.dense.bias
discriminator.deberta.encoder.layer.20.output.LayerNorm.weight
discriminator.deberta.encoder.layer.20.output.LayerNorm.bias
discriminator.deberta.encoder.layer.21.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.21.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.21.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.21.attention.output.dense.weight
discriminator.deberta.encoder.layer.21.attention.output.dense.bias
discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.21.intermediate.dense.weight
discriminator.deberta.encoder.layer.21.intermediate.dense.bias
discriminator.deberta.encoder.layer.21.output.dense.weight
discriminator.deberta.encoder.layer.21.output.dense.bias
discriminator.deberta.encoder.layer.21.output.LayerNorm.weight
discriminator.deberta.encoder.layer.21.output.LayerNorm.bias
discriminator.deberta.encoder.layer.22.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.22.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.22.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.22.attention.output.dense.weight
discriminator.deberta.encoder.layer.22.attention.output.dense.bias
discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.22.intermediate.dense.weight
discriminator.deberta.encoder.layer.22.intermediate.dense.bias
discriminator.deberta.encoder.layer.22.output.dense.weight
discriminator.deberta.encoder.layer.22.output.dense.bias
discriminator.deberta.encoder.layer.22.output.LayerNorm.weight
discriminator.deberta.encoder.layer.22.output.LayerNorm.bias
discriminator.deberta.encoder.layer.23.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.23.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.23.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.23.attention.output.dense.weight
discriminator.deberta.encoder.layer.23.attention.output.dense.bias
discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.23.intermediate.dense.weight
discriminator.deberta.encoder.layer.23.intermediate.dense.bias
discriminator.deberta.encoder.layer.23.output.dense.weight
discriminator.deberta.encoder.layer.23.output.dense.bias
discriminator.deberta.encoder.layer.23.output.LayerNorm.weight
discriminator.deberta.encoder.layer.23.output.LayerNorm.bias
discriminator.deberta.encoder.rel_embeddings.weight
discriminator.deberta.encoder.LayerNorm.weight
discriminator.deberta.encoder.LayerNorm.bias
discriminator.mask_predictions.dense.weight
discriminator.mask_predictions.dense.bias
discriminator.mask_predictions.LayerNorm.weight
discriminator.mask_predictions.LayerNorm.bias
discriminator.mask_predictions.classifier.weight
discriminator.mask_predictions.classifier.bias
]
</details> <!-- END-OF MODEL STAGE DICT KEYS -->
Method 2: Manually load pretrained vi-deberta-v3-large
- Dev YourTokenizer, Your Model
class YourTokenizer: @classmethod def from_pretrained(model_name_or_path, **kwargs): # https://huggingface.co/anhdungitvn/vi-deberta-v3-large/blob/main/tokenizer/spm.model pass class YourModel: @classmethod def from_pretrained(model_name_or_path, **kwargs): # Discriminator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/discriminator # Generator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/generator pass
- Use
tokenizer = YourTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer = YourModel.from_pretrained("anhdungitvn/vi-deberta-v3-large")
Log
<!-- START-OF LOG --> <details> <summary>Details</summary>
Logs:
- 2023-03-29: init, todolist
- 2023-03-30: data preparation
- vi_wiki_23: lastest
- vi_news_17g: available
- 2023-03-30: tokenizer training
- algorithm: unigram, bpe
- size: 8k, 16k, 32k, 64k, 128k, 256k
- 2023-03-31: training trials
- config: base, large
- tokenizer: unigram_16k, bpe_16k, bpe_128k
- args: default changed batch_size, grad_acc
- optimizer: default, customized optimizer
- 2023-03-31: training phase 0 started
- config: large
- tokenizer: bpe_128k
- args: default
- GPU: 5x A100-SXM4-80G
- 2023-04-04: training interrupted unintentionally, optimizer checkpoint none, step 300000
- 2023-04-05: training resumed from lastcheckpoint 300000, learning_rate addjusted 100µ -> 50µ
- 2023-04-10: sweet spot detected, step 800000
- 2023-04-11: training in progress, step 900000, loss increases, accuracy increases, regularization being working well, overfitting problem under monitoring
- 2023-04-12: training interrupted intentionally, step 1000000
- 2023-04-12: training phase 1 started, resumed intentionally, refining, learning_rate -> 2µ (diverging)
- 2023-04-24: training phase 1 finished, refining, step 1500000
- 2023-04-24: training phase 1 finished, refining, step 1500000
- 2023-04-26: training phase 2 started, resumed intentionally, step 1500000
- 2023-05-02: training phase 2 interrupted unintentionally, step 1980000
- 2023-05-05: training phase 2 resumed intentionally, step 2000000
- 2023-05-09: training in progress, step 2200000
</details> <!-- END-OF LOG -->