distilbert

task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
Number of evaluation samples: 10

Fixed parameters:

Benchmarked parameters:

Evaluation

Non-time metrics

quantization_approach operators_to_quantize precision (original) precision (optimized) recall (original) recall (optimized) f1 (original) f1 (optimized) accuracy (original) accuracy (optimized)
dynamic ['Add', 'MatMul'] | 0.970 0.969 | 0.970 0.939 | 0.970 0.954 | 0.993 0.990
dynamic ['Add'] | 0.970 0.970 | 0.970 0.970 | 0.970 0.970 | 0.993 0.993
static ['Add', 'MatMul'] | 0.970 0.104 | 0.970 0.212 | 0.970 0.140 | 0.993 0.691
static ['Add'] | 0.970 0.037 | 0.970 0.121 | 0.970 0.057 | 0.993 0.110

Time metrics

Time benchmarks were run for 3 seconds per config.

Below, time metrics for batch size = 1, input length = 64.

quantization_approach operators_to_quantize latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
dynamic ['Add', 'MatMul'] | 60.12 18.13 | 16.67 55.33
dynamic ['Add'] | 59.49 29.12 | 17.00 34.67
static ['Add', 'MatMul'] | 58.89 24.30 | 17.00 41.33
static ['Add'] | 43.19 38.12 | 23.33 26.33