vicuna-33b-v1.3-4bit-g128-awq

Vicuna is a chat assistant trained by LMSYS. This is a 4-bit AWQ quantized Vicuna v1.3 model.

AWQ is an efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.

Reference

If you find AWQ useful or relevant to your research, please kindly cite the paper:

@article{lin2023awq,
  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
  journal={arXiv},
  year={2023}
}

Vicuna Model Card

Model Details

Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Model Sources