wandb run: https://wandb.ai/eleutherai/pythia-rlhf/runs/e0drjcsz?workspace=user-yongzx
Model Evals:
| Task | Version | Filter | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|
| arc_challenge | Yaml | none | acc | 0.1877 | ± | 0.0114 | 
| none | acc_norm | 0.2372 | ± | 0.0124 | ||
| arc_easy | Yaml | none | acc | 0.4390 | ± | 0.0102 | 
| none | acc_norm | 0.4082 | ± | 0.0101 | ||
| logiqa | Yaml | none | acc | 0.1889 | ± | 0.0154 | 
| none | acc_norm | 0.2473 | ± | 0.0169 | ||
| piqa | Yaml | none | acc | 0.6213 | ± | 0.0113 | 
| none | acc_norm | 0.6279 | ± | 0.0113 | ||
| sciq | Yaml | none | acc | 0.7230 | ± | 0.0142 | 
| none | acc_norm | 0.6840 | ± | 0.0147 | ||
| winogrande | Yaml | none | acc | 0.5162 | ± | 0.0140 | 
| wsc | Yaml | none | acc | 0.3654 | ± | 0.0474 | 
| lambada_openai | Yaml | none | perplexity | 58.9478 | ± | 2.7662 | 
| none | acc | 0.2602 | ± | 0.0061 |