This model is based on bigscience/bloom-3b.
We pruned its vocabulary from 250880 to 46145 with Chinese corpus to reduce GPU memory usage. So the total parameter is 2b5 now.
How to use
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer = BloomTokenizerFast.from_pretrained('Langboat/bloom-2b5-zh')
model = BloomForCausalLM.from_pretrained('Langboat/bloom-2b5-zh')
print(tokenizer.batch_decode(model.generate(tokenizer.encode('中国的首都是', return_tensors='pt'))))