gpt llm large language model

Model Card

One of the Best 3B Model! Surpassing dolly-v2-12b in the Open LLM Leaderboard!

One of the best 3B model on the Open LLM Leaderboard, with performance surpassing dolly-v2-12b!

Metric Value
MMLU (5-shot) 30.0
ARC (25-shot) 42.6
HellaSwag (10-shot) 71.0
TruthfulQA (0-shot) 37.3
Avg. 45.2

We used the SOTA(State Of The Art) Language Model Evaluation Harness to run the benchmark tests above.

The following is the performance under 0-shot testing, mostly better than acrastt/Marx-3B-V2

hf-causal (pretrained=CobraMamba/mamba-gpt-3b-v4), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

The training code and data will be open sourced later on Github(https://github.com/chi2liu/mamba-gpt-3b).

Training Dataset

mamba-gpt-3b-v4 is trained on multiple datasets:

Summary

We have fine-tuned the OpenLLaMA model and surpassed the original model in multiple evaluation subtasks, making it currently one of the best performing 3B model, with comparable performance to llama-7b.

Usage

To use the model with the transformers library on a machine with GPU(s), first make sure you have the transformers, accelerate and torch libraries installed.

pip install transformers==4.29.2
pip install accelerate==0.19.0
pip install torch==2.0.0

Then, run the following Python snippet:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CobraMamba/mamba-gpt-3b-v4")
model = AutoModelForCausalLM.from_pretrained("CobraMamba/mamba-gpt-3b-v4", trust_remote_code=True, torch_dtype=torch.float16)

# we use alpaca prompt
input_content = "Your text here"
input_ids = tokenizer.encode(input_content, return_tensors="pt")
output = model.generate(input_ids, max_length=128, temperature=0.7)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)

Citation

If this work is helpful, please kindly cite as:

@Misc{mamba-gpt-3b-v4,
  title = {Mamba-GPT-3b-v4},
  author = {chiliu},
  howpublished = {\url{https://huggingface.co/CobraMamba/mamba-gpt-3b-v4}},
  year = {2023}
}

Disclaimer

Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.