Inspired by Debart Reward Model Series
llm-blender/pair-reward-model
is pairranker version finetuned specifically as a reward model using deberta-v3-large.
Statistics
Context length
PairRanker type | Source max length | Candidate max length | Total max length |
---|---|---|---|
pair-ranker | 128 | 128 | 384 |
pair-reward-model (This model) | 1224 | 412 | 2048 |
Usage Example
Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
Otherwise, loading it directly with hugging face from_pretrained()
API will encounter errors.
- First install
llm-blender
pip install git+https://github.com/yuchenlin/LLM-Blender.git
- Then load pairranker with the following code:
import llm_blender
# ranker config
ranker_config = llm_blender.RankerConfig()
ranker_config.ranker_type = "pairranker" # only supports pairranker now.
ranker_config.model_type = "deberta"
ranker_config.model_name = "microsoft/deberta-v3-large" # ranker backbone
ranker_config.load_checkpoint = "llm-blender/pair-reward-model" # hugging face hub model path or your local ranker checkpoint <your checkpoint path>
ranker_config.cache_dir = "./hf_models" # hugging face model cache dir
ranker_config.source_maxlength = 1224 # pair-reward-model source maxlength
ranker_config.candidate_maxlength = 412 # pair-reward-model candidate maxlength
ranker_config.n_tasks = 1 # number of singal that has been used to train the ranker. This checkpoint is trained using BARTScore only, thus being 1.
fuser_config = llm_blender.GenFuserConfig()
# ignore fuser config as we don't use it here. You can load it if you want
blender_config = llm_blender.BlenderConfig()
# blender config
blender_config.device = "cuda" # blender ranker and fuser device
blender = llm_blender.Blender(blender_config, ranker_config, fuser_config)
- Then you can rank candidates with the following function
inputs = ["input1", "input2"]
candidates_texts = [["candidate1 for input1", "candidatefor input1"], ["candidate1 for input2", "candidate2 for input2"]]
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=2)
# ranks is a list of ranks where ranks[i][j] represents the ranks of candidate-j for input-i
- Using pairranker to directly compare two candidates
candidates_A = [cands[0] for cands in candidates]
candidates_B = [cands[1] for cands in candidates]
comparison_results = blender.compare(inputs, candidates_A, candidates_B)
# comparison_results is a list of bool, where element[i] denotes whether candidates_A[i] is better than candidates_B[i] for inputs[i]
See LLM-Blender Github README.md and jupyter file blender_usage.ipynb for detailed usage examples.