Qwen/Qwen-14B-Chat
Despite the repo name, it's the chat version.
After the release of Mistral, I realized that Chinese models were underappreciated.
This monster needed 60 GB of peak memory for quantization.
Credits
Usage
Start the interactive chat by running the linux command.
./main -m ./Qwen-14b-Q8_0.bin --tiktoken ./qwen.tiktoken -i
仙女小可 Evaluation Results
MMLU | GSM8K | Humaneval | MBPP | |
---|---|---|---|---|
Qwen 14B Chat | 64% | 61% | 32% | 41% |
LLama 2 13B | 56% | 34% | 19% | 35% |
Phi 1.5 | 37% | 40% | 34% | 38% |
Code Llama 7B | 37% | 21% | 31% | 53% |
Mistral 7B | 60% | 52% | 31% | 48% |
Columns: English, Mathematics, Coding, Basic Python Programming Evaluation
馍馍的做法 Architecture
Layers | 40 |
Heads | 40 |
Embedding | 5120 |
Vocabulary | 151851 |
Sequence length | 2048 |
Find me on
Sh-it-just-works and Patreon