Xwin-LM-70B-V0.1-EXL2-2.500b
A quantized version of Xwin-LM-70B-V0.1 in EXL2 format.
It was created with WizardLM_evol_instruct_70k as the parquet calibration file and this command (I had to first convert the model to safetensors):
python convert.py \
-i ../Xwin-LM_Xwin-LM-70B-V0.1_safetensors \
-o ~/working \
-cf Xwin-LM-70B-V0.1-EXL2-2.500b \
-c Evol-Instruct-Code-80k-v1.parquet \
-b 2.500
I used WizardLM_evol_instruct_70k as the calibration dataset instead of wikitext with the hope that this will lead to a better performance for typical instruct tasks.
Note
If you get gibberish output, remove the BOS token from the beginning of your prompts.
In text-generation-webui, this can be done by unchecking "Add the bos_token to the beginning of prompts" under "Parameters" > "Generation".
See this issue for details: https://github.com/turboderp/exllamav2/issues/123