Reward model for HH-RLHF

<!-- Provide a quick summary of what the model is/does. -->

In this repo, we present a reward model trained by the framework LMFlow. The reward model is for the HH-RLHF dataset, and is trained from the base model GPT-Neo-2.7B.

It achieves ~69 evaluation accuracy on the validation set. You may also check the reward model based on Open-llama-3b.

Reference

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

If you found this model useful, please cite our framework and paper using the following BibTeX:

@article{diao2023lmflow,
  title={Lmflow: An extensible toolkit for finetuning and inference of large foundation models},
  author={Diao, Shizhe and Pan, Rui and Dong, Hanze and Shum, Ka Shun and Zhang, Jipeng and Xiong, Wei and Zhang, Tong},
  journal={arXiv preprint arXiv:2306.12420},
  year={2023}
}
@article{dong2023raft,
  title={Raft: Reward ranked finetuning for generative foundation model alignment},
  author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},
  journal={arXiv preprint arXiv:2304.06767},
  year={2023}
}