Models with tag reinforcement-learning-from-human-feedback retrieved: 3

PKU-Alignment/beaver-7b-v1.0-reward reinforcement-learning-from-human-feedback
PKU-Alignment/beaver-7b-v1.0-cost reinforcement-learning-from-human-feedback
PKU-Alignment/beaver-7b-v1.0 reinforcement-learning-from-human-feedback