Reward model for HH-RLHF
<!-- Provide a quick summary of what the model is/does. -->
In this repo, we present a reward model trained by the framework LMFlow. The reward model is for the HH-RLHF dataset, and is trained from the base model GPT-Neo-2.7B.
It achieves ~69 evaluation accuracy on the validation set. You may also check the reward model based on Open-llama-3b.
Reference
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you found this model useful, please cite our framework and paper using the following BibTeX:
@article{diao2023lmflow,
title={Lmflow: An extensible toolkit for finetuning and inference of large foundation models},
author={Diao, Shizhe and Pan, Rui and Dong, Hanze and Shum, Ka Shun and Zhang, Jipeng and Xiong, Wei and Zhang, Tong},
journal={arXiv preprint arXiv:2306.12420},
year={2023}
}
@article{dong2023raft,
title={Raft: Reward ranked finetuning for generative foundation model alignment},
author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},
journal={arXiv preprint arXiv:2304.06767},
year={2023}
}