This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

Using comit <a href="https://github.com/0cc4m/GPTQ-for-LLaMa/tree/3c16fd9c7946ebe85df8d951cb742adbc1966ec7">3c16fd9</a> of 0cc4m's GPTQ fork
Using PTB as the calibration dataset
Act-order, True-sequential, percdamp 0.1 (<i>the default percdamp is 0.01</i>)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

<b>Model<b>	<b>C4<b>	<b>WikiText2<b>	<b>PTB<b>
Aeala's FP16	7.05504846572876	4.662261962890625	24.547462463378906
This Quant	7.326207160949707	4.957101345062256	24.941526412963867
Aeala's Quant <a href="https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/resolve/main/4bit.safetensors">here</a>	7.332120418548584	5.016242980957031	25.576189041137695