Starcoder Finetuned Model for Natural Language to SQL Generation

This is a 15-billion parameter model designed for generating SQL queries from natural language text in the context of Text to SQL datasets. It is intended for the task of Natural Language to SQL Generation.

Model Overview

The starcoder-text2sql-v1 model is a variant of the Starcoder model. It initially underwent instruction tuning on an English dataset, followed by PEFT fine-tuning using custom Natural Language to SQL datasets.

Intended Applications and Constraints

This model is specifically crafted for the purpose of transforming natural language queries into SQL queries. To achieve optimal results, it is recommended to provide contextual information about the tables and their respective database schemas.

Usage Guidelines

To employ this model for generating SQL queries from English queries, follow these steps:

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="IBM-DTT/starcoder-text2sql-v1", torch_dtype=torch.bfloat16, device_map="auto")

prompt = "Question: Who is the manufacturer for the order year 1998?\nSQL:"

outputs = pipe(prompt, max_new_tokens=256, temperature=0.2, top_k=50, top_p=0.95)