Reward Model pretrained on openai/webgpt_comparison
Reward model finetuned from existing pretrain model.
Things that aligned with the orignal papers
-
Overfits easily using rank loss
-
Small learning rate
Different from the papers
-
Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
-
Train using a 80-20 train-validation split on torch AMP settings
Other models I had tried
-
bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
-
gpt2-large : not stable
-
gpt2-base : not stable
Performance on validation split
model | val acc | val loss (rank loss) |
---|---|---|
roberta-base | 56.21 | 0.71 |
roberta-large | 57.89 | 0.67 |
electra-base | 57.02 | 0.70 |
electra-large | 58.75 | 0.69 |
Tensorboard logs are located under runs/
Note:
- You will have to reweight this model output such that the mean rewards equals to 0