Aira-RLHF-124M
Aira-RLHF
is an RLHF-tuned version of Aira-Instruct-124M
, using both the RewardModel
as a primary preference model and the ToxicityModel
as an auxiliary preference model.
This model/repository only serves to host the RLHF experiments performed on the Aira-Instruct series. The training notebook can be found in this repository.
Limitations
This model is for research purposes only. We recommend using the Aira-Instruct-124M
for better performance and light with experimentation.
🤥 Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.
🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.
Note: Aira-RLHF-124M
has a worst performance than Aira-Instruct-124M
in several domains, which may come from several reasons (size of the model, unalignment of the reward model, etc.) we are still trying to iron out. 🤔
Cite as 🤗
@misc{nicholas22aira,
doi = {10.5281/zenodo.6989727},
url = {https://huggingface.co/nicholasKluge/Aira-RLHF-124M},
author = {Nicholas Kluge Corrêa},
title = {Aira},
year = {2023},
publisher = {HuggingFace},
journal = {HuggingFace repository},
}
License
The Aira-RLHF-124M
is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.