Model Card for Model ID

This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.

Model Details

Model Description

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model