This is the quantized version 4-bits created using autotrain, but it doesn't work.

Error

GPU

image/png

CPU

image/png

Quantization Process

!pip install auto-gptq
!pip install git+https://github.com/huggingface/optimum.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install --upgrade accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer,GPTQConfig
tokenizer = AutoTokenizer.from_pretrained("inception-mbzuai/jais-13b-chat")
gptq_config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer)
model = AutoModelForCausalLM.from_pretrained('inception-mbzuai/jais-13b-chat', quantization_config=gptq_config,trust_remote_code=True)