webgpt regression reward-model

Reward Model pretrained on openai/webgpt_comparison

Reward model finetuned from existing pretrain model.

Things that aligned with the orignal papers

Overfits easily using rank loss
Small learning rate

Different from the papers

Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
Train using a 80-20 train-validation split on torch AMP settings

Other models I had tried

bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
gpt2-large : not stable
gpt2-base : not stable

Performance on validation split

model	val acc	val loss (rank loss)
roberta-base	56.21	0.71
roberta-large	57.89	0.67
electra-base	57.02	0.70
electra-large	58.75	0.69

Tensorboard logs are located under runs/

Note:

You will have to reweight this model output such that the mean rewards equals to 0

NSDT 3DConvert

Convert 30+ 3D formats online: GLTF, GLB, GBX, OBJ, DAE, IFC, STEP, STL...

UnrealSynth

Unreal engine based photo realistic synthetic data generator for YOLO.

DreamTexture.js

AI powered 3d texture generation and projection SDK for three.js.