Vietnamese DebertaV3 Large (vi-deberta-v3-large)

Todo

 [x] Corpora collection
 [x] Tokenizer training
 [x] Model pretraining
 [ ] Model finetuning
 [ ] Experimental results, comparision, and conclusion

Model Info

LAYER NAME                      #PARAMS      RATIO       MEM(MB)
--model:                    851,542,017    100.00%       3248.38
  --generator:              284,459,008     33.41%       1085.12
    --deberta:              283,279,360     33.27%       1080.62
    --lm_predictions:         1,179,648      0.14%          4.50
  --discriminator:          567,083,009     66.59%       2163.25
    --deberta:              566,030,336     66.47%       2159.23
    --mask_predictions:       1,052,673      0.12%          4.02

Model Perfomance

Metric Value
accuracy 0.7113977778702509
eval_loss 1.3216993808746338
eval_metric 0.7113977778702509
eval_samples 240310
perplexity 3.749788284301758
best_metric 0.7113977778702509@2200000
train_steps 2200000
train_loss 1.1960319969044906

TL;DR

Aspect Sub-Aspect Description
Corpus Language Vietnames
Source Wiki 2023 (1GB), News 2023 (17.2GB), News (64GB)
Size 1GB, 18GB, 64GB
Preprocesing None
Tokenizer Lib SentencePiece
Algorithm BPE
Type spm
Vocab 128000
Ref https://github.com/google/sentencepiece
Model Type DeBERTaV3
Ref https://openreview.net/forum?id=sE7-XhLxHA
Code https://github.com/microsoft/DeBERTa
Pretraining Task RTD
Config model_config.json
Args default
Hardware 5x Nvidia A100-SXM4-80G, 2x Nvidia 4090-PCI-24GB
Phases Init, Refining, Enlarging
Status Training on hold step 2200000
Finetuning Status Not started (need help)

Repo

📁vi-deberta-v3-large
  |---🗎config.json
  |---🗎pytorch_model.bin
  |---🗎spm.model
  |---tl;dr.pdf
  |---📁discriminator
  |---📁generator
  |---📁tokenizer
  |---📁metrics
  |---📁logs

Pretraining

<!-- START-OF PHASE 0 --> <details> <summary>Phase 0: Init</summary>

Info

Metrics

<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses.png"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities.png" width="648">

</details> <!-- END-OF PHASE 0 -->

<!-- START-OF PHASE 1 --> <details> <summary>Phase 1: Refining</summary>

Info

Metrics

Metric Value
accuracy 0.7515653334245732
eval_loss 1.0692176818847656
eval_metric 0.7515653334245732
eval_samples 29227
perplexity 2.913099527359009
best_metric 0.7522154172511719@1450000
train_steps 1500000
train_loss 1.1779516744723688

<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_refining.png"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_refining.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_refining.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_refining.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_refining.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_refining.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_refining.png" width="648">

</details> <!-- END-OF PHASE 1 -->

<!-- START-OF PHASE 2 --> <details> <summary>Phase 2: Enlarging</summary>

Info

Metrics

Metric Value
accuracy 0.7084723898298032
eval_loss 1.3221531009674072
eval_metric 0.7084723898298032
eval_samples 240310
perplexity 3.7141621112823486
best_metric 0.7084723898298032@2000000
train_steps 2000000
train_loss 1.2167873119241372

<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_enlarging.png"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_enlarging.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_enlarging.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_enlarging.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_enlarging.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_enlarging.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_enlarging.png" width="648">

</details> <!-- END-OF PHASE 2 -->

Phase 2: Enlarging (resume, on hold)

Metric Value
accuracy 0.7113977778702509
eval_loss 1.3216993808746338
eval_metric 0.7113977778702509
eval_samples 240310
perplexity 3.749788284301758
best_metric 0.7113977778702509@2200000
train_steps 2200000
train_loss 1.1960319969044906

<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/train_eval_losses_enlarging_2.png"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/losses_enlarging_2.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/eval_losses_enlarging_2.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/accuracies_enlarging_2.png" width="648">
<img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grads_enlarging_2.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/grad_exp_xscale_log_enlarging_2.png" width="648"> <img src="https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/metrics/perplexities_enlarging_2.png" width="648">

Finetuning NEED HELP

Experimental Results and Comparision

Usage

test

Method 1: Load pretrained vi-deberta-v3-large with Transformers AutoClass

<!-- START-OF MODEL STAGE DICT KEYS --> <details> <summary>Model state dict keys</summary>

model.state_dict().keys()
[
  generator.deberta.embeddings.word_embeddings.weight
  generator.deberta.embeddings.position_embeddings.weight
  generator.deberta.embeddings.LayerNorm.weight
  generator.deberta.embeddings.LayerNorm.bias
  generator.deberta.encoder.layer.0.attention.self.query_proj.weight
  generator.deberta.encoder.layer.0.attention.self.query_proj.bias
  generator.deberta.encoder.layer.0.attention.self.key_proj.weight
  generator.deberta.encoder.layer.0.attention.self.key_proj.bias
  generator.deberta.encoder.layer.0.attention.self.value_proj.weight
  generator.deberta.encoder.layer.0.attention.self.value_proj.bias
  generator.deberta.encoder.layer.0.attention.output.dense.weight
  generator.deberta.encoder.layer.0.attention.output.dense.bias
  generator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.0.intermediate.dense.weight
  generator.deberta.encoder.layer.0.intermediate.dense.bias
  generator.deberta.encoder.layer.0.output.dense.weight
  generator.deberta.encoder.layer.0.output.dense.bias
  generator.deberta.encoder.layer.0.output.LayerNorm.weight
  generator.deberta.encoder.layer.0.output.LayerNorm.bias
  generator.deberta.encoder.layer.1.attention.self.query_proj.weight
  generator.deberta.encoder.layer.1.attention.self.query_proj.bias
  generator.deberta.encoder.layer.1.attention.self.key_proj.weight
  generator.deberta.encoder.layer.1.attention.self.key_proj.bias
  generator.deberta.encoder.layer.1.attention.self.value_proj.weight
  generator.deberta.encoder.layer.1.attention.self.value_proj.bias
  generator.deberta.encoder.layer.1.attention.output.dense.weight
  generator.deberta.encoder.layer.1.attention.output.dense.bias
  generator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.1.intermediate.dense.weight
  generator.deberta.encoder.layer.1.intermediate.dense.bias
  generator.deberta.encoder.layer.1.output.dense.weight
  generator.deberta.encoder.layer.1.output.dense.bias
  generator.deberta.encoder.layer.1.output.LayerNorm.weight
  generator.deberta.encoder.layer.1.output.LayerNorm.bias
  generator.deberta.encoder.layer.2.attention.self.query_proj.weight
  generator.deberta.encoder.layer.2.attention.self.query_proj.bias
  generator.deberta.encoder.layer.2.attention.self.key_proj.weight
  generator.deberta.encoder.layer.2.attention.self.key_proj.bias
  generator.deberta.encoder.layer.2.attention.self.value_proj.weight
  generator.deberta.encoder.layer.2.attention.self.value_proj.bias
  generator.deberta.encoder.layer.2.attention.output.dense.weight
  generator.deberta.encoder.layer.2.attention.output.dense.bias
  generator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.2.intermediate.dense.weight
  generator.deberta.encoder.layer.2.intermediate.dense.bias
  generator.deberta.encoder.layer.2.output.dense.weight
  generator.deberta.encoder.layer.2.output.dense.bias
  generator.deberta.encoder.layer.2.output.LayerNorm.weight
  generator.deberta.encoder.layer.2.output.LayerNorm.bias
  generator.deberta.encoder.layer.3.attention.self.query_proj.weight
  generator.deberta.encoder.layer.3.attention.self.query_proj.bias
  generator.deberta.encoder.layer.3.attention.self.key_proj.weight
  generator.deberta.encoder.layer.3.attention.self.key_proj.bias
  generator.deberta.encoder.layer.3.attention.self.value_proj.weight
  generator.deberta.encoder.layer.3.attention.self.value_proj.bias
  generator.deberta.encoder.layer.3.attention.output.dense.weight
  generator.deberta.encoder.layer.3.attention.output.dense.bias
  generator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.3.intermediate.dense.weight
  generator.deberta.encoder.layer.3.intermediate.dense.bias
  generator.deberta.encoder.layer.3.output.dense.weight
  generator.deberta.encoder.layer.3.output.dense.bias
  generator.deberta.encoder.layer.3.output.LayerNorm.weight
  generator.deberta.encoder.layer.3.output.LayerNorm.bias
  generator.deberta.encoder.layer.4.attention.self.query_proj.weight
  generator.deberta.encoder.layer.4.attention.self.query_proj.bias
  generator.deberta.encoder.layer.4.attention.self.key_proj.weight
  generator.deberta.encoder.layer.4.attention.self.key_proj.bias
  generator.deberta.encoder.layer.4.attention.self.value_proj.weight
  generator.deberta.encoder.layer.4.attention.self.value_proj.bias
  generator.deberta.encoder.layer.4.attention.output.dense.weight
  generator.deberta.encoder.layer.4.attention.output.dense.bias
  generator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.4.intermediate.dense.weight
  generator.deberta.encoder.layer.4.intermediate.dense.bias
  generator.deberta.encoder.layer.4.output.dense.weight
  generator.deberta.encoder.layer.4.output.dense.bias
  generator.deberta.encoder.layer.4.output.LayerNorm.weight
  generator.deberta.encoder.layer.4.output.LayerNorm.bias
  generator.deberta.encoder.layer.5.attention.self.query_proj.weight
  generator.deberta.encoder.layer.5.attention.self.query_proj.bias
  generator.deberta.encoder.layer.5.attention.self.key_proj.weight
  generator.deberta.encoder.layer.5.attention.self.key_proj.bias
  generator.deberta.encoder.layer.5.attention.self.value_proj.weight
  generator.deberta.encoder.layer.5.attention.self.value_proj.bias
  generator.deberta.encoder.layer.5.attention.output.dense.weight
  generator.deberta.encoder.layer.5.attention.output.dense.bias
  generator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.5.intermediate.dense.weight
  generator.deberta.encoder.layer.5.intermediate.dense.bias
  generator.deberta.encoder.layer.5.output.dense.weight
  generator.deberta.encoder.layer.5.output.dense.bias
  generator.deberta.encoder.layer.5.output.LayerNorm.weight
  generator.deberta.encoder.layer.5.output.LayerNorm.bias
  generator.deberta.encoder.layer.6.attention.self.query_proj.weight
  generator.deberta.encoder.layer.6.attention.self.query_proj.bias
  generator.deberta.encoder.layer.6.attention.self.key_proj.weight
  generator.deberta.encoder.layer.6.attention.self.key_proj.bias
  generator.deberta.encoder.layer.6.attention.self.value_proj.weight
  generator.deberta.encoder.layer.6.attention.self.value_proj.bias
  generator.deberta.encoder.layer.6.attention.output.dense.weight
  generator.deberta.encoder.layer.6.attention.output.dense.bias
  generator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.6.intermediate.dense.weight
  generator.deberta.encoder.layer.6.intermediate.dense.bias
  generator.deberta.encoder.layer.6.output.dense.weight
  generator.deberta.encoder.layer.6.output.dense.bias
  generator.deberta.encoder.layer.6.output.LayerNorm.weight
  generator.deberta.encoder.layer.6.output.LayerNorm.bias
  generator.deberta.encoder.layer.7.attention.self.query_proj.weight
  generator.deberta.encoder.layer.7.attention.self.query_proj.bias
  generator.deberta.encoder.layer.7.attention.self.key_proj.weight
  generator.deberta.encoder.layer.7.attention.self.key_proj.bias
  generator.deberta.encoder.layer.7.attention.self.value_proj.weight
  generator.deberta.encoder.layer.7.attention.self.value_proj.bias
  generator.deberta.encoder.layer.7.attention.output.dense.weight
  generator.deberta.encoder.layer.7.attention.output.dense.bias
  generator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.7.intermediate.dense.weight
  generator.deberta.encoder.layer.7.intermediate.dense.bias
  generator.deberta.encoder.layer.7.output.dense.weight
  generator.deberta.encoder.layer.7.output.dense.bias
  generator.deberta.encoder.layer.7.output.LayerNorm.weight
  generator.deberta.encoder.layer.7.output.LayerNorm.bias
  generator.deberta.encoder.layer.8.attention.self.query_proj.weight
  generator.deberta.encoder.layer.8.attention.self.query_proj.bias
  generator.deberta.encoder.layer.8.attention.self.key_proj.weight
  generator.deberta.encoder.layer.8.attention.self.key_proj.bias
  generator.deberta.encoder.layer.8.attention.self.value_proj.weight
  generator.deberta.encoder.layer.8.attention.self.value_proj.bias
  generator.deberta.encoder.layer.8.attention.output.dense.weight
  generator.deberta.encoder.layer.8.attention.output.dense.bias
  generator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.8.intermediate.dense.weight
  generator.deberta.encoder.layer.8.intermediate.dense.bias
  generator.deberta.encoder.layer.8.output.dense.weight
  generator.deberta.encoder.layer.8.output.dense.bias
  generator.deberta.encoder.layer.8.output.LayerNorm.weight
  generator.deberta.encoder.layer.8.output.LayerNorm.bias
  generator.deberta.encoder.layer.9.attention.self.query_proj.weight
  generator.deberta.encoder.layer.9.attention.self.query_proj.bias
  generator.deberta.encoder.layer.9.attention.self.key_proj.weight
  generator.deberta.encoder.layer.9.attention.self.key_proj.bias
  generator.deberta.encoder.layer.9.attention.self.value_proj.weight
  generator.deberta.encoder.layer.9.attention.self.value_proj.bias
  generator.deberta.encoder.layer.9.attention.output.dense.weight
  generator.deberta.encoder.layer.9.attention.output.dense.bias
  generator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.9.intermediate.dense.weight
  generator.deberta.encoder.layer.9.intermediate.dense.bias
  generator.deberta.encoder.layer.9.output.dense.weight
  generator.deberta.encoder.layer.9.output.dense.bias
  generator.deberta.encoder.layer.9.output.LayerNorm.weight
  generator.deberta.encoder.layer.9.output.LayerNorm.bias
  generator.deberta.encoder.layer.10.attention.self.query_proj.weight
  generator.deberta.encoder.layer.10.attention.self.query_proj.bias
  generator.deberta.encoder.layer.10.attention.self.key_proj.weight
  generator.deberta.encoder.layer.10.attention.self.key_proj.bias
  generator.deberta.encoder.layer.10.attention.self.value_proj.weight
  generator.deberta.encoder.layer.10.attention.self.value_proj.bias
  generator.deberta.encoder.layer.10.attention.output.dense.weight
  generator.deberta.encoder.layer.10.attention.output.dense.bias
  generator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.10.intermediate.dense.weight
  generator.deberta.encoder.layer.10.intermediate.dense.bias
  generator.deberta.encoder.layer.10.output.dense.weight
  generator.deberta.encoder.layer.10.output.dense.bias
  generator.deberta.encoder.layer.10.output.LayerNorm.weight
  generator.deberta.encoder.layer.10.output.LayerNorm.bias
  generator.deberta.encoder.layer.11.attention.self.query_proj.weight
  generator.deberta.encoder.layer.11.attention.self.query_proj.bias
  generator.deberta.encoder.layer.11.attention.self.key_proj.weight
  generator.deberta.encoder.layer.11.attention.self.key_proj.bias
  generator.deberta.encoder.layer.11.attention.self.value_proj.weight
  generator.deberta.encoder.layer.11.attention.self.value_proj.bias
  generator.deberta.encoder.layer.11.attention.output.dense.weight
  generator.deberta.encoder.layer.11.attention.output.dense.bias
  generator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
  generator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
  generator.deberta.encoder.layer.11.intermediate.dense.weight
  generator.deberta.encoder.layer.11.intermediate.dense.bias
  generator.deberta.encoder.layer.11.output.dense.weight
  generator.deberta.encoder.layer.11.output.dense.bias
  generator.deberta.encoder.layer.11.output.LayerNorm.weight
  generator.deberta.encoder.layer.11.output.LayerNorm.bias
  generator.deberta.encoder.rel_embeddings.weight
  generator.deberta.encoder.LayerNorm.weight
  generator.deberta.encoder.LayerNorm.bias
  generator.lm_predictions.lm_head.bias
  generator.lm_predictions.lm_head.dense.weight
  generator.lm_predictions.lm_head.dense.bias
  generator.lm_predictions.lm_head.LayerNorm.weight
  generator.lm_predictions.lm_head.LayerNorm.bias
  discriminator.deberta.embeddings.word_embeddings.weight
  discriminator.deberta.embeddings.word_embeddings._weight
  discriminator.deberta.embeddings.position_embeddings.weight
  discriminator.deberta.embeddings.position_embeddings._weight
  discriminator.deberta.embeddings.LayerNorm.weight
  discriminator.deberta.embeddings.LayerNorm.bias
  discriminator.deberta.encoder.layer.0.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.0.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.0.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.0.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.0.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.0.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.0.attention.output.dense.weight
  discriminator.deberta.encoder.layer.0.attention.output.dense.bias
  discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.0.intermediate.dense.weight
  discriminator.deberta.encoder.layer.0.intermediate.dense.bias
  discriminator.deberta.encoder.layer.0.output.dense.weight
  discriminator.deberta.encoder.layer.0.output.dense.bias
  discriminator.deberta.encoder.layer.0.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.0.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.1.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.1.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.1.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.1.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.1.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.1.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.1.attention.output.dense.weight
  discriminator.deberta.encoder.layer.1.attention.output.dense.bias
  discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.1.intermediate.dense.weight
  discriminator.deberta.encoder.layer.1.intermediate.dense.bias
  discriminator.deberta.encoder.layer.1.output.dense.weight
  discriminator.deberta.encoder.layer.1.output.dense.bias
  discriminator.deberta.encoder.layer.1.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.1.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.2.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.2.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.2.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.2.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.2.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.2.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.2.attention.output.dense.weight
  discriminator.deberta.encoder.layer.2.attention.output.dense.bias
  discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.2.intermediate.dense.weight
  discriminator.deberta.encoder.layer.2.intermediate.dense.bias
  discriminator.deberta.encoder.layer.2.output.dense.weight
  discriminator.deberta.encoder.layer.2.output.dense.bias
  discriminator.deberta.encoder.layer.2.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.2.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.3.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.3.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.3.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.3.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.3.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.3.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.3.attention.output.dense.weight
  discriminator.deberta.encoder.layer.3.attention.output.dense.bias
  discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.3.intermediate.dense.weight
  discriminator.deberta.encoder.layer.3.intermediate.dense.bias
  discriminator.deberta.encoder.layer.3.output.dense.weight
  discriminator.deberta.encoder.layer.3.output.dense.bias
  discriminator.deberta.encoder.layer.3.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.3.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.4.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.4.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.4.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.4.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.4.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.4.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.4.attention.output.dense.weight
  discriminator.deberta.encoder.layer.4.attention.output.dense.bias
  discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.4.intermediate.dense.weight
  discriminator.deberta.encoder.layer.4.intermediate.dense.bias
  discriminator.deberta.encoder.layer.4.output.dense.weight
  discriminator.deberta.encoder.layer.4.output.dense.bias
  discriminator.deberta.encoder.layer.4.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.4.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.5.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.5.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.5.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.5.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.5.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.5.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.5.attention.output.dense.weight
  discriminator.deberta.encoder.layer.5.attention.output.dense.bias
  discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.5.intermediate.dense.weight
  discriminator.deberta.encoder.layer.5.intermediate.dense.bias
  discriminator.deberta.encoder.layer.5.output.dense.weight
  discriminator.deberta.encoder.layer.5.output.dense.bias
  discriminator.deberta.encoder.layer.5.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.5.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.6.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.6.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.6.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.6.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.6.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.6.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.6.attention.output.dense.weight
  discriminator.deberta.encoder.layer.6.attention.output.dense.bias
  discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.6.intermediate.dense.weight
  discriminator.deberta.encoder.layer.6.intermediate.dense.bias
  discriminator.deberta.encoder.layer.6.output.dense.weight
  discriminator.deberta.encoder.layer.6.output.dense.bias
  discriminator.deberta.encoder.layer.6.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.6.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.7.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.7.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.7.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.7.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.7.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.7.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.7.attention.output.dense.weight
  discriminator.deberta.encoder.layer.7.attention.output.dense.bias
  discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.7.intermediate.dense.weight
  discriminator.deberta.encoder.layer.7.intermediate.dense.bias
  discriminator.deberta.encoder.layer.7.output.dense.weight
  discriminator.deberta.encoder.layer.7.output.dense.bias
  discriminator.deberta.encoder.layer.7.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.7.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.8.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.8.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.8.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.8.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.8.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.8.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.8.attention.output.dense.weight
  discriminator.deberta.encoder.layer.8.attention.output.dense.bias
  discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.8.intermediate.dense.weight
  discriminator.deberta.encoder.layer.8.intermediate.dense.bias
  discriminator.deberta.encoder.layer.8.output.dense.weight
  discriminator.deberta.encoder.layer.8.output.dense.bias
  discriminator.deberta.encoder.layer.8.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.8.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.9.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.9.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.9.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.9.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.9.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.9.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.9.attention.output.dense.weight
  discriminator.deberta.encoder.layer.9.attention.output.dense.bias
  discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.9.intermediate.dense.weight
  discriminator.deberta.encoder.layer.9.intermediate.dense.bias
  discriminator.deberta.encoder.layer.9.output.dense.weight
  discriminator.deberta.encoder.layer.9.output.dense.bias
  discriminator.deberta.encoder.layer.9.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.9.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.10.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.10.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.10.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.10.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.10.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.10.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.10.attention.output.dense.weight
  discriminator.deberta.encoder.layer.10.attention.output.dense.bias
  discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.10.intermediate.dense.weight
  discriminator.deberta.encoder.layer.10.intermediate.dense.bias
  discriminator.deberta.encoder.layer.10.output.dense.weight
  discriminator.deberta.encoder.layer.10.output.dense.bias
  discriminator.deberta.encoder.layer.10.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.10.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.11.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.11.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.11.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.11.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.11.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.11.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.11.attention.output.dense.weight
  discriminator.deberta.encoder.layer.11.attention.output.dense.bias
  discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.11.intermediate.dense.weight
  discriminator.deberta.encoder.layer.11.intermediate.dense.bias
  discriminator.deberta.encoder.layer.11.output.dense.weight
  discriminator.deberta.encoder.layer.11.output.dense.bias
  discriminator.deberta.encoder.layer.11.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.11.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.12.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.12.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.12.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.12.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.12.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.12.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.12.attention.output.dense.weight
  discriminator.deberta.encoder.layer.12.attention.output.dense.bias
  discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.12.intermediate.dense.weight
  discriminator.deberta.encoder.layer.12.intermediate.dense.bias
  discriminator.deberta.encoder.layer.12.output.dense.weight
  discriminator.deberta.encoder.layer.12.output.dense.bias
  discriminator.deberta.encoder.layer.12.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.12.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.13.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.13.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.13.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.13.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.13.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.13.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.13.attention.output.dense.weight
  discriminator.deberta.encoder.layer.13.attention.output.dense.bias
  discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.13.intermediate.dense.weight
  discriminator.deberta.encoder.layer.13.intermediate.dense.bias
  discriminator.deberta.encoder.layer.13.output.dense.weight
  discriminator.deberta.encoder.layer.13.output.dense.bias
  discriminator.deberta.encoder.layer.13.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.13.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.14.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.14.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.14.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.14.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.14.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.14.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.14.attention.output.dense.weight
  discriminator.deberta.encoder.layer.14.attention.output.dense.bias
  discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.14.intermediate.dense.weight
  discriminator.deberta.encoder.layer.14.intermediate.dense.bias
  discriminator.deberta.encoder.layer.14.output.dense.weight
  discriminator.deberta.encoder.layer.14.output.dense.bias
  discriminator.deberta.encoder.layer.14.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.14.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.15.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.15.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.15.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.15.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.15.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.15.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.15.attention.output.dense.weight
  discriminator.deberta.encoder.layer.15.attention.output.dense.bias
  discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.15.intermediate.dense.weight
  discriminator.deberta.encoder.layer.15.intermediate.dense.bias
  discriminator.deberta.encoder.layer.15.output.dense.weight
  discriminator.deberta.encoder.layer.15.output.dense.bias
  discriminator.deberta.encoder.layer.15.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.15.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.16.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.16.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.16.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.16.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.16.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.16.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.16.attention.output.dense.weight
  discriminator.deberta.encoder.layer.16.attention.output.dense.bias
  discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.16.intermediate.dense.weight
  discriminator.deberta.encoder.layer.16.intermediate.dense.bias
  discriminator.deberta.encoder.layer.16.output.dense.weight
  discriminator.deberta.encoder.layer.16.output.dense.bias
  discriminator.deberta.encoder.layer.16.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.16.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.17.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.17.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.17.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.17.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.17.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.17.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.17.attention.output.dense.weight
  discriminator.deberta.encoder.layer.17.attention.output.dense.bias
  discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.17.intermediate.dense.weight
  discriminator.deberta.encoder.layer.17.intermediate.dense.bias
  discriminator.deberta.encoder.layer.17.output.dense.weight
  discriminator.deberta.encoder.layer.17.output.dense.bias
  discriminator.deberta.encoder.layer.17.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.17.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.18.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.18.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.18.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.18.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.18.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.18.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.18.attention.output.dense.weight
  discriminator.deberta.encoder.layer.18.attention.output.dense.bias
  discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.18.intermediate.dense.weight
  discriminator.deberta.encoder.layer.18.intermediate.dense.bias
  discriminator.deberta.encoder.layer.18.output.dense.weight
  discriminator.deberta.encoder.layer.18.output.dense.bias
  discriminator.deberta.encoder.layer.18.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.18.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.19.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.19.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.19.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.19.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.19.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.19.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.19.attention.output.dense.weight
  discriminator.deberta.encoder.layer.19.attention.output.dense.bias
  discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.19.intermediate.dense.weight
  discriminator.deberta.encoder.layer.19.intermediate.dense.bias
  discriminator.deberta.encoder.layer.19.output.dense.weight
  discriminator.deberta.encoder.layer.19.output.dense.bias
  discriminator.deberta.encoder.layer.19.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.19.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.20.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.20.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.20.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.20.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.20.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.20.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.20.attention.output.dense.weight
  discriminator.deberta.encoder.layer.20.attention.output.dense.bias
  discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.20.intermediate.dense.weight
  discriminator.deberta.encoder.layer.20.intermediate.dense.bias
  discriminator.deberta.encoder.layer.20.output.dense.weight
  discriminator.deberta.encoder.layer.20.output.dense.bias
  discriminator.deberta.encoder.layer.20.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.20.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.21.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.21.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.21.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.21.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.21.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.21.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.21.attention.output.dense.weight
  discriminator.deberta.encoder.layer.21.attention.output.dense.bias
  discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.21.intermediate.dense.weight
  discriminator.deberta.encoder.layer.21.intermediate.dense.bias
  discriminator.deberta.encoder.layer.21.output.dense.weight
  discriminator.deberta.encoder.layer.21.output.dense.bias
  discriminator.deberta.encoder.layer.21.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.21.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.22.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.22.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.22.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.22.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.22.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.22.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.22.attention.output.dense.weight
  discriminator.deberta.encoder.layer.22.attention.output.dense.bias
  discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.22.intermediate.dense.weight
  discriminator.deberta.encoder.layer.22.intermediate.dense.bias
  discriminator.deberta.encoder.layer.22.output.dense.weight
  discriminator.deberta.encoder.layer.22.output.dense.bias
  discriminator.deberta.encoder.layer.22.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.22.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.23.attention.self.query_proj.weight
  discriminator.deberta.encoder.layer.23.attention.self.query_proj.bias
  discriminator.deberta.encoder.layer.23.attention.self.key_proj.weight
  discriminator.deberta.encoder.layer.23.attention.self.key_proj.bias
  discriminator.deberta.encoder.layer.23.attention.self.value_proj.weight
  discriminator.deberta.encoder.layer.23.attention.self.value_proj.bias
  discriminator.deberta.encoder.layer.23.attention.output.dense.weight
  discriminator.deberta.encoder.layer.23.attention.output.dense.bias
  discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.bias
  discriminator.deberta.encoder.layer.23.intermediate.dense.weight
  discriminator.deberta.encoder.layer.23.intermediate.dense.bias
  discriminator.deberta.encoder.layer.23.output.dense.weight
  discriminator.deberta.encoder.layer.23.output.dense.bias
  discriminator.deberta.encoder.layer.23.output.LayerNorm.weight
  discriminator.deberta.encoder.layer.23.output.LayerNorm.bias
  discriminator.deberta.encoder.rel_embeddings.weight
  discriminator.deberta.encoder.LayerNorm.weight
  discriminator.deberta.encoder.LayerNorm.bias
  discriminator.mask_predictions.dense.weight
  discriminator.mask_predictions.dense.bias
  discriminator.mask_predictions.LayerNorm.weight
  discriminator.mask_predictions.LayerNorm.bias
  discriminator.mask_predictions.classifier.weight
  discriminator.mask_predictions.classifier.bias
]

</details> <!-- END-OF MODEL STAGE DICT KEYS -->

Method 2: Manually load pretrained vi-deberta-v3-large

Log

<!-- START-OF LOG --> <details> <summary>Details</summary>

Logs:

</details> <!-- END-OF LOG -->