Model Card

One of the best 7B model on the Open LLM Leaderboard, with performance surpassing dolly-v2-12b!

The training code and data will be open sourced later on Github(https://github.com/chi2liu/mamba-gpt-3b).

Training Dataset

mamba-gpt-7b is trained on multiple datasets:

Summary

We have fine-tuned the OpenLLaMA model and surpassed the original model in multiple evaluation subtasks, making it currently one of the best performing 3B model, with comparable performance to llama-7b.

Base model: openlm-research/open_llama_7b_v2

Usage

To use the model with the transformers library on a machine with GPU(s), first make sure you have the transformers, accelerate and torch libraries installed.

pip install transformers==4.29.2
pip install accelerate==0.19.0
pip install torch==2.0.0

Then, run the following Python snippet:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CobraMamba/mamba-gpt-7b")
model = AutoModelForCausalLM.from_pretrained("CobraMamba/mamba-gpt-7b", trust_remote_code=True, torch_dtype=torch.bfloat16)

# we use llama2 prompt
input_content = "Your text here"
input_ids = tokenizer.encode(input_content, return_tensors="pt")
output = model.generate(input_ids, max_length=128, temperature=0.7)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)

Citation

If this work is helpful, please kindly cite as:

@Misc{mamba-gpt-7b,
  title = {Mamba-GPT-7b},
  author = {chiliu},
  howpublished = {\url{https://huggingface.co/CobraMamba/mamba-gpt-7b}},
  year = {2023}
}

Disclaimer

Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.