Joint Pruning, Quantization and Distillation for BERT-large/SQuADv1.1
Setup
git clone https://github.com/vuiseng9/optimum-intel
cd optimum-intel
pip install -e .[openvino,nncf]
cd examples/openvino/question-answering/
pip install -r requirements.txt
pip install wandb # optional
Run
NNCFCFG=/path/to/openvino_config.json
MASTER_PORT=<PORTID>
RUNID=<RUN_IDENTIFIER>
OUTDIR=/path/to/saved_model
NEPOCH=30
python -m torch.distributed.launch \
--nproc_per_node 4 \
--master_port $MASTER_PORT \
run_qa.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--dataset_name squad \
--teacher_model_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
--distillation_weight 0.9 \
--do_eval \
--fp16 \
--do_train \
--learning_rate 3e-5 \
--num_train_epochs $NEPOCH \
--per_device_eval_batch_size 128 \
--per_device_train_batch_size 16 \
--max_seq_length 384 \
--doc_stride 128 \
--logging_steps 1 \
--evaluation_strategy steps \
--eval_steps 250 \
--save_steps 500 \
--overwrite_output_dir \
--run_name $RUNID \
--output_dir $OUTDIR \
--nncf_compression_config $NNCFCFG
Reference Results
Global Step: 41000
F1: 90.842
EM: 84.276
Structured Sparsity (linear): 77.73%