neural-compressor text-classfication int8 8-bit onnx Intel® Neural Compressor

Dynamically quantized DistilBERT base uncased finetuned SST-2

Table of Contents

Model Details

Model Description: This model is a DistilBERT fine-tuned on SST-2 dynamically quantized with optimum-intel through the usage of huggingface/optimum-intel through the usage of Intel® Neural Compressor.

How to Get Started With the Model

PyTorch

To load the quantized model, you can do as follows:

from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification

model = IncQuantizedModelForSequenceClassification.from_pretrained("Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic")

ONNX

This is an INT8 ONNX model quantized with Intel® Neural Compressor.

The original fp32 model comes from the fine-tuned model DistilBERT.

Test result

INT8 FP32
Accuracy (eval-accuracy) 0.9025 0.9106
Model size (MB) 165 256

Load ONNX model:

from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained('Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic')