Falcon 7B LLM Fine Tune Model

Model description

This model is a fine-tuned version of the tiiuae/falcon-7b model using the QLoRa library and the PEFT library.

Intended uses & limitations

How to use

The model and tokenizer are loaded using the from_pretrained methods.
The padding token of the tokenizer is set to be the same as the end-of-sentence (EOS) token.
The generation_config is used to set parameters for generating responses, such as the maximum number of new tokens to generate and the temperature for the softmax function.
The prompt is defined, encoded using the tokenizer, and passed to the model.generate method to generate a response.
The generated response is decoded using the tokenizer and printed.

# Import necessary classes and functions
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftConfig, PeftModel

# Specify the model
PEFT_MODEL = "hipnologo/falcon-7b-qlora-finetune-chatbot"

# Load the PEFT config
config = PeftConfig.from_pretrained(PEFT_MODEL)

# Load the base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    config.based_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Set the padding token to be the same as the EOS token
tokenizer.pad_token = tokenizer.eos_token

# Load the PEFT model
model = PeftModel.from_pretrained(model, PEFT_MODEL)

# Set the generation parameters
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# Define the prompt
prompt = """
<human>: How can I create an account?
<assistant>:
""".strip()
print(prompt)

# Encode the prompt
encoding = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate a response
with torch.inference_mode():
  outputs = model.generate(
      input_ids=encoding.input_ids,
      attention_mask=encoding.attention_mask,
      generation_config=generation_config,
  )

# Print the generated response
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Training procedure

The model was fine-tuned on the Ecommerce-FAQ-Chatbot-Dataset using the bitsandbytes quantization config:

load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Framework versions

PEFT 0.4.0.dev0

Evaluation results

The model was trained for 80 steps, with the training loss decreasing from 0.184 to nearly 0. The final training loss was 0.03094411873175886.

Trainable params: 2359296
All params: 3611104128
Trainable%: 0.06533447711203746

License

This model is licensed under Apache 2.0. Please see the LICENSE for more information.