wandb: https://wandb.ai/open-assistant/rlhf/runs/csz0ickm base-model: andreaskoepf/oasst-sft-4-pythia-12b-epoch-3.5 reward-model: andreaskoepf/oasst-rm-2-pythia-1.4b-10000 checkpoint: 2000 steps