Mistral-7B-SQL

Model creator: Mistral AI
Original model: Mistral 7B v0.1

Description

This repo contains LoRA finetuned model files for Mistral AI's Mistral 7B v0.1.

Prompt template: Mistral

<s>[INST] {prompt} [/INST]

Instruction format

In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

For example:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "machinists/Mistral-7B-SQL"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

table_schema = "CREATE TABLE head (age INTEGER)"

question = "How many heads of the departments are older than 56 ?"

system_msg = f" Generate a correct SQL query from the following database schema. \n {table_schema} "

prompt = f"<s>[INST] {system_msg} \n{question} [/INST]"

sequences = pipeline(
    prompt,
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Model Architecture

Mistral-7B-v0.1 is a transformer model, with the following architecture choices:

Grouped-Query Attention
Sliding-Window Attention
Byte-fallback BPE tokenizer

Finetuning

Finetuning technique involving rank of matrices to reduce computational and memory requirements. Read more: LoRA Hugging Face article

Epoch - 10

Dataset Name - b-mc2/sql-create-context

No. of records - 78.6k

Model Loading - bf16 and flash attention2

Finetuning Technique - Using LoRA on all linear layers

MaxSeqLength - 2048

Effective Batch Size - 4

Mixed Precision Training - tf32

Hardware and Software

Training Hardware: 1 X Nvidia A100 80GB GPU

Logs

{'loss': 0.1482, 'learning_rate': 0.00019650373138244836, 'epoch': 9.04}
{'loss': 0.1464, 'learning_rate': 0.00019648798715270855, 'epoch': 9.05}
{'loss': 0.1399, 'learning_rate': 0.00019647224292296874, 'epoch': 9.05}
{'loss': 0.1341, 'learning_rate': 0.00019645649869322896, 'epoch': 9.05}
{'train_runtime': 21650.1924, 'train_samples_per_second': 36.294, 'train_steps_per_second': 9.073, 'train_loss': 0.2878391900494646, 'epoch': 9.05}

The Machinists Team

Manish Kumar, Aakash Sarin

Troubleshooting

If you see the following error:

KeyError: 'mistral'

NotImplementedError: Cannot copy out of meta tensor; no data!

Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer.

Notice

Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms.

ReadMe References

https://huggingface.co/mistralai/Mistral-7B-v0.1

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-AWQ