Wandb runs: https://wandb.ai/eleutherai/pythia-rlhf/runs/s0qdwbg6?workspace=user-yongzx
Evaluation results:
| Task | Version | Filter | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|
| arc_challenge | Yaml | none | acc | 0.1758 | ± | 0.0111 |
| none | acc_norm | 0.2176 | ± | 0.0121 | ||
| arc_easy | Yaml | none | acc | 0.3742 | ± | 0.0099 |
| none | acc_norm | 0.3565 | ± | 0.0098 | ||
| logiqa | Yaml | none | acc | 0.2058 | ± | 0.0159 |
| none | acc_norm | 0.2412 | ± | 0.0168 | ||
| piqa | Yaml | none | acc | 0.5958 | ± | 0.0114 |
| none | acc_norm | 0.5941 | ± | 0.0115 | ||
| sciq | Yaml | none | acc | 0.5930 | ± | 0.0155 |
| none | acc_norm | 0.5720 | ± | 0.0157 | ||
| winogrande | Yaml | none | acc | 0.5154 | ± | 0.0140 |
| wsc | Yaml | none | acc | 0.3654 | ± | 0.0474 |
| lambada_openai | Yaml | none | perplexity | 730.2552 | ± | 46.8739 |
| none | acc | 0.1316 | ± | 0.0047 |