wandb run: https://wandb.ai/usvsnsp/trlx/runs/llxa7qkl
Model evals:
| Task | Version | Filter | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|
| arc_challenge | Yaml | none | acc | 0.3387 | ± | 0.0138 | 
| none | acc_norm | 0.3532 | ± | 0.0140 | ||
| arc_easy | Yaml | none | acc | 0.6936 | ± | 0.0095 | 
| none | acc_norm | 0.6187 | ± | 0.0100 | ||
| logiqa | Yaml | none | acc | 0.2335 | ± | 0.0166 | 
| none | acc_norm | 0.2734 | ± | 0.0175 | ||
| piqa | Yaml | none | acc | 0.7535 | ± | 0.0101 | 
| none | acc_norm | 0.7693 | ± | 0.0098 | ||
| sciq | Yaml | none | acc | 0.9020 | ± | 0.0094 | 
| none | acc_norm | 0.8320 | ± | 0.0118 | ||
| winogrande | Yaml | none | acc | 0.6267 | ± | 0.0136 |