Qwen/Qwen-14B-Chat

Despite the repo name, it's the chat version.

After the release of Mistral, I realized that Chinese models were underappreciated.

This monster needed 60 GB of peak memory for quantization.

Credits

Start the interactive chat by running the linux command.

./main -m ./Qwen-14b-Q8_0.bin --tiktoken ./qwen.tiktoken -i

	MMLU	GSM8K	Humaneval	MBPP
Qwen 14B Chat	64%	61%	32%	41%
LLama 2 13B	56%	34%	19%	35%
Phi 1.5	37%	40%	34%	38%
Code Llama 7B	37%	21%	31%	53%
Mistral 7B	60%	52%	31%	48%

Columns: English, Mathematics, Coding, Basic Python Programming Evaluation