ChatGLM 6b int8 quantization
已过时 Deprecated:使用新的 ChatGLM2 模型
See K024/chatglm-q for more details.
import torch
from chatglm_q.decoder import ChatGLMDecoder, chat_template
device = torch.device("cuda")
decoder = ChatGLMDecoder.from_pretrained("K024/chatglm-6b-int8", device=device)
prompt = chat_template([], "我是谁?")
for text in decoder.generate(prompt):
print(text)
Model weights are released under the same license as ChatGLM-6b, see MODEL LICENSE.