Introduction

Basiclly an update to the old attempt of vicuna-chinese-replication-beta

Again, this is for research purpose only. There's no guarantee for its performance. All credits to the original authors of LLaMA and Chinese-LLaMA.

Compared with previous release, new model improves on coding and reasoning problem. However it still suffers from hallucinations and perform poorly on Chinese domain-specific problem, e.g. chinese literature and idioms.

Usage

We use exactly the Vicuna template for training and inference. Sample code as below.

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "keyfan/vicuna-chinese-replication-v1.1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(checkpoint).cuda()

template = ("A chat between a curious human and an artificial intelligence assistant. "
            "The assistant gives helpful, detailed, and polite answers to the human's questions. "
            "USER: {}\nASSISTANT:")
question = template.format("Who was the president of the United States in 1955?")
inputs = tokenizer.encode(question, return_tensors="pt").cuda()
outputs = model.generate(inputs, do_sample=True, temperature=0.2, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Evaluation

Model Macro-Average QA OQA REASONING LITERATURE ENTERTAINMENT GENERATION TRANSLATION CODE ETHICS
Alpaca-Plus-13B 77.3 70 74 70 80 77 82 89 64 90
ours 82.4 81 87 88 73 78 85 83 83 84
Average Avg(Hard) STEM Social Science Humanities Others
37.0 29.5 34.6 44.5 35.7 35.9