distilbert

task: text-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
Number of evaluation samples: All dataset

Fixed parameters:

Benchmarked parameters:

Evaluation

Non-time metrics

framework operators_to_quantize per_channel framework_args apply_quantization accuracy
onnxruntime None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False | 0.911
onnxruntime ['Add', 'MatMul'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.898
onnxruntime ['Add', 'MatMul'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.490
onnxruntime ['Add'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.911
onnxruntime ['Add'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.911
pytorch None None {} None | 0.911

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 1, input length = 224.

framework operators_to_quantize per_channel framework_args apply_quantization latency_mean (ms) throughput (/s)
onnxruntime None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False | 83.23 | 12.07
onnxruntime ['Add', 'MatMul'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 64.31 | 15.60
onnxruntime ['Add', 'MatMul'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 64.78 | 15.47
onnxruntime ['Add'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 82.63 | 12.13
onnxruntime ['Add'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 83.82 | 11.93
pytorch None None {} None | 84.34 | 11.87