This is a pretrained model used in PPO toy example from CarperAI/trlX