QMB15/Wizard-Vicuna-30B-SuperHOT-8k-test-GPTQ - AI Model Zoo

This is https://huggingface.co/ehartford/Wizard-Vicuna-30B-Uncensored, merged with https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test, then quantized to 4bit with AutoGPTQ. There are two quantized versions. One is a plain 4bit version with only act-order and no groupsize. The other is an experimental version using groupsize 128, act-order, and kaiokendev's ScaledLLamaAttention monkey patch applied during quantization, the idea being to help the calibration account for the new scale. It seems to have worked as it improves by around 0.04 ppl vs the unpatched quant - maybe not worth the trouble, but it's better so I'll put it up anyway.