Xwin-LM-70B-V0.1-EXL2-2.500b

A quantized version of Xwin-LM-70B-V0.1 in EXL2 format.

It was created with WizardLM_evol_instruct_70k as the parquet calibration file and this command (I had to first convert the model to safetensors):

python convert.py \
  -i ../Xwin-LM_Xwin-LM-70B-V0.1_safetensors \
  -o ~/working \
  -cf Xwin-LM-70B-V0.1-EXL2-2.500b \
  -c Evol-Instruct-Code-80k-v1.parquet \
  -b 2.500

I used WizardLM_evol_instruct_70k as the calibration dataset instead of wikitext with the hope that this will lead to a better performance for typical instruct tasks.

Note

If you get gibberish output, remove the BOS token from the beginning of your prompts.

In text-generation-webui, this can be done by unchecking "Add the bos_token to the beginning of prompts" under "Parameters" > "Generation".

See this issue for details: https://github.com/turboderp/exllamav2/issues/123