WizardLM/WizardCoder-Python-34B-V1.0 - AI Model Zoo

🤗 <a href="https://huggingface.co/WizardLM" target="_blank">HF Repo</a> •🐱 <a href="https://github.com/nlpxucan/WizardLM" target="_blank">Github Repo</a> • 🐦 <a href="https://twitter.com/WizardLM_AI" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> • 📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> • 📃 <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a> 👋 Join our <a href="https://discord.gg/VZjjHtWrKs" target="_blank">Discord</a>

News

🔥🔥🔥[2023/08/26] We released WizardCoder-Python-34B-V1.0 , which achieves the 73.2 pass@1 and surpasses GPT4 (2023/03/15), ChatGPT-3.5, and Claude2 on the HumanEval Benchmarks.
[2023/06/16] We released WizardCoder-15B-V1.0 , which achieves the 57.3 pass@1 and surpasses Claude-Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks.

❗Note: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of OpenAI. The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).

Model	Checkpoint	Paper	HumanEval	MBPP	Demo	License
WizardCoder-Python-34B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>	73.2	61.2	Demo	<a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>
WizardCoder-15B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>	59.8	50.6	--	<a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>
WizardCoder-Python-13B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>	64.0	55.6	--	<a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>
WizardCoder-3B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardCoder-3B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>	34.8	37.4	Demo	<a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>
WizardCoder-1B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardCoder-1B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>	23.8	28.6	--	<a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>

Our WizardMath-70B-V1.0 model slightly outperforms some closed-source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.
Our WizardMath-70B-V1.0 model achieves 81.6 pass@1 on the GSM8k Benchmarks, which is 24.8 points higher than the SOTA open-source LLM, and achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open-source LLM.

Model	Checkpoint	Paper	GSM8k	MATH	Online Demo	License
WizardMath-70B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>	81.6	22.7	Demo	<a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a>
WizardMath-13B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardMath-13B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>	63.9	14.0	Demo	<a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a>
WizardMath-7B-V1.0	🤗 <a href="https://huggingface.co/WizardLM/WizardMath-7B-V1.0" target="_blank">HF Link</a>	📃 <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>	54.9	10.7	Demo	<a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a>
</font>

[08/09/2023] We released WizardLM-70B-V1.0 model. Here is Full Model Weight.

<sup>Model</sup>	<sup>Checkpoint</sup>	<sup>Paper</sup>	<sup>MT-Bench</sup>	<sup>AlpacaEval</sup>	<sup>GSM8k</sup>	<sup>HumanEval</sup>	<sup>License</sup>
<sup>WizardLM-70B-V1.0</sup>	<sup>🤗 <a href="https://huggingface.co/WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>	<sup>📃Coming Soon</sup>	<sup>7.78</sup>	<sup>92.91%</sup>	<sup>77.6%</sup>	<sup> 50.6</sup>	<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup>
<sup>WizardLM-13B-V1.2</sup>	<sup>🤗 <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>		<sup>7.06</sup>	<sup>89.17%</sup>	<sup>55.3%</sup>	<sup>36.6 </sup>	<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup>
<sup>WizardLM-13B-V1.1</sup>	<sup> 🤗 <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup>		<sup>6.76</sup>	<sup>86.32%</sup>		<sup>25.0 </sup>	<sup>Non-commercial</sup>
<sup>WizardLM-30B-V1.0</sup>	<sup>🤗 <a href="https://huggingface.co/WizardLM/WizardLM-30B-V1.0" target="_blank">HF Link</a></sup>		<sup>7.01</sup>			<sup>37.8 </sup>	<sup>Non-commercial</sup>
<sup>WizardLM-13B-V1.0</sup>	<sup>🤗 <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.0" target="_blank">HF Link</a> </sup>		<sup>6.35</sup>	<sup>75.31%</sup>		<sup> 24.0 </sup>	<sup>Non-commercial</sup>
<sup>WizardLM-7B-V1.0 </sup>	<sup>🤗 <a href="https://huggingface.co/WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup>	<sup> 📃 <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>				<sup>19.1 </sup>	<sup> Non-commercial</sup>
</font>

Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.

🔥 The following figure shows that our WizardCoder-Python-34B-V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).

Prompt Format

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

Inference Demo Script

We provide the inference demo code here.

Citation

Please cite the repo if you use the data, method or code in this repo.

@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}

News

Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.

Prompt Format

Inference Demo Script

Citation

NSDT 3DConvert

UnrealSynth

DreamTexture.js