generated_from_trainer

deberta-v3-large-finetuned-squadv2

This model is a version of microsoft/deberta-v3-large fine-tuned on the SQuAD version 2.0 dataset. Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.

Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.

Results calculated with:

metrics = evaluate.load("squad_v2")
squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references)

for this fine-tuning:

Model description

For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa

Intended uses

Extractive question answering on a given context

Fine-tuning hyperparameters

The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:

Framework versions

System

Fine-tuning (Training) results before/after the best model (Step 3620)

Training Loss Epoch Step Validation Loss
0.5323 1.72 3500 0.5860
0.5129 1.73 3520 0.5656
0.5441 1.74 3540 0.5642
0.5624 1.75 3560 0.5873
0.4645 1.76 3580 0.5891
0.5577 1.77 3600 0.5816
0.5199 1.78 3620 0.5579
0.5061 1.79 3640 0.5837
0.484 1.79 3660 0.5721
0.5095 1.8 3680 0.5821
0.5342 1.81 3700 0.5602