Wandb Runs: https://wandb.ai/eleutherai/pythia-rlhf/runs/644tyaq0?workspace=user-yongzx

Model Evals:

Task Version Filter Metric Value Stderr
arc_challenge Yaml none acc 0.2287 ± 0.0123
none acc_norm 0.2619 ± 0.0128
arc_easy Yaml none acc 0.5248 ± 0.0102
none acc_norm 0.4533 ± 0.0102
logiqa Yaml none acc 0.2089 ± 0.0159
none acc_norm 0.2765 ± 0.0175
piqa Yaml none acc 0.6855 ± 0.0108
none acc_norm 0.6823 ± 0.0109
sciq Yaml none acc 0.8050 ± 0.0125
none acc_norm 0.7080 ± 0.0144
winogrande Yaml none acc 0.5335 ± 0.0140
wsc Yaml none acc 0.3654 ± 0.0474
lambada_openai Yaml none perplexity 9.8265 ± 0.3139
none acc 0.5135 ± 0.0070