causal-lm

StableLM-Base-Alpha-7B-v2

Model Description

StableLM-Base-Alpha-7B-v2 is a 7 billion parameter decoder-only language model pre-trained on diverse English datasets. This model is the successor to the first StableLM-Base-Alpha-7B model, addressing previous shortcomings through the use of improved data sources and mixture ratios.

Usage

Get started generating text with StableLM-Base-Alpha-7B-v2 by using the following code snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-base-alpha-7b-v2")
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablelm-base-alpha-7b-v2",
  trust_remote_code=True,
  torch_dtype="auto",
)
model.cuda()
inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.75,
  top_p=0.95,
  do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Model Details

Model Architecture

Parameters Hidden Size Layers Heads Sequence Length
6,890,209,280 4096 32 32 4096

The model is a decoder-only transformer similar to the StableLM-Base-Alpha (v1) with the following configurations:

Training

StableLM-Base-Alpha-7B-v2 is pre-trained using a multi-stage context length extension schedule following similar work (Nijkamp et al. 2023); first pre-training at a context length of 2048 for 1 trillion tokens, then fine-tuning at a context length of 4096 for another 100B tokens.

Training Dataset

The first pre-training stage relies on 1 trillion tokens sourced from a mix of the public Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer 2023, The Pile (Gao et al., 2020), and internal datasets with web text sampled at a rate of 71%.

In the second stage, we include the StarCoder (Li et al., 2023) dataset and down sample web text to 55% while increasing sampling proportions of naturally long text examples in the aforementioned sources.

Training Procedure

The model is pre-trained on the dataset mixes mentioned above in mixed-precision (FP16), optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 50,257. We outline the complete hyperparameters choices in the project's GitHub repository - config.

Training Infrastructure

Use and Limitations

Intended Use

These models are intended to be used by all individuals as foundational models for application-specific fine-tuning without strict limitations on commercial use.

Limitations and bias

The pre-training dataset may have contained offensive or inappropriate content even after applying data cleansing filters which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models for any applications that may cause harm or distress to individuals or groups.

How to cite

@misc{StableLMAlphaV2Models, 
      url={[https://huggingface.co/stabilityai/stablelm-base-alpha-7b-v2](https://huggingface.co/stabilityai/stablelm-base-alpha-7b-v2)},
      title={StableLM Alpha v2 Models},
      author={Tow, Jonathan}
}