SN-13B-8k-Instruct

SN-13B-8k-Instruct is a 13 billion parameter model. It was pretrained as well as instruction tuned on SambaNova DataScale systems. This model is meant to be used for tasks requiring long sequence understanding.

Model Details

Model Description

Developed by: SambaNova Systems
Model type: Language Model
Language(s): English
License: Apache 2.0

Basic Information

Blog Post: Link
Discord: Link

Licensing

To increase accessibility and to support the open-source community, SambaNova is releasing SN-13B-8k-Instruct under an Apache 2.0 license. Please review SambaNova’s SN-13B-8k-Instruct License

Uses

<details> <summary>Click to expand</summary>

Direct Use

This model is intended for commercial and research use.

Out-of-Scope Use

SN-13B-8k-Instruct should NOT be used for:

Mission-critical applications
Applications that involve the safety of others
Making highly important decisions
Important automated pipelines

This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a multilingual chat LLM baseline.

Recommendations

Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.

</details>

Running the model

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")

prompt = 'Define Machine Learning.'
inputs = tokenizer(prompt, return_tensors='pt')

# SN-13B-8k-Instruct occasionally repeats itself when do_sample=False.
# Set do_sample=True when using the model to avoid this.
outputs = model.generate(**inputs, use_cache=True, max_new_tokens=50, do_sample=False)

print(tokenizer.batch_decode(outputs))

Training Details

<details> <summary>Click to expand</summary>

Training Procedure

We trained SN-13B-8k-Instruct with SambaNova DataScale systems with SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion tokens on sequences of size 2048. We then pretrained for another 250 Billion tokens on sequences of size 8192. During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with 30% of our articles consisting of greater than 6000 words.

We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, Natural Instructions, etc.

Hyperparameters

Pretraining on 8k SS

Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
Optimizer: AdamW
Steps: 60000
Global Batch size: 1024
Learning Rate: 1e-5
Learning Rate Scheduler: Fixed
Warmup Steps: 0
Weight decay: 0.1

Instruction-tuned Training

Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
Optimizer: AdamW
Steps: 35000
Global Batch size: 64
Learning Rate: 1e-5
Learning Rate Scheduler: Fixed
Warmup Steps: 0
Weight decay: 0.1

</details>

Bias, Risks, and Limitations

Like all LLMs, SN-13B-8k-Instruct has certain limitations:

Hallucination: SN-13B-8k-Instruct may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
Repetition: SN-13B-8k-Instruct may produce repetitive phrases or sentences, leading to less engaging and informative responses.
Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
Toxicity: SN-13B-8k-Instruct may inadvertently generate responses containing inappropriate or harmful content.

Acknowledgment

We appreciate Scrolls and ZeroScrolls for their contributions in creating effective benchmarks to test the long sequence understanding of Large Language Models. We appreciate lm-eval-harness and HELM for their essential benchmarking contributions, which were both very helpful in evaluating SN-13B-8k-Instruct's performance. We appreciate the inspiration from the wave of various recent open-source long sequence models, including XGen, MPT, and Llama-2 and so on. We look forward to witnessing the continued growth and success of open-source long sequence models.

We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of SN-13B-8k-Instruct, and we hope that our model can contribute to further advancements in the field.

Cite SN-13B-8k-Instruct

@software{sn-13b-8k-instruct,
  title = {SN-13B-8k-Instruct: training long sequence size models with SambaNova},
  author = {SambaNova Systems},
  url = {https://huggingface.co/sambanovasystems/SN-13B-8k-Instruct}
  month = {8},
  year = {2023},
  version = {1.0},
}