GGML/GGUF(v2) Quantizations of the model: https://huggingface.co/winglian/llama-2-4b Which is a Llama2 4B model based on Llama2 7B.