TuningAI/Llama2_13B_startup_Assistant - AI Model Zoo

Model Name: Llama2_13B_startup_Assistant

Description:

Llama2_13B_startup_Assistant is a highly specialized language model fine-tuned from Meta's Llama2_13B. It has been tailored to assist with inquiries related to Algerian startups, offering valuable insights and guidance in these domains.

Base Model:

This model is based on the Meta's meta-llama/Llama-2-13b-chat-hf architecture, making it a highly capable foundation for generating human-like text responses.

Dataset :

This model was fine-tuned on a custom dataset meticulously curated with more than 200 unique examples. The dataset incorporates both manual entries and contributions from GPT3.5, GPT4, and Falcon 180B models.

Fine-tuning Techniques:

Fine-tuning was performed using QLoRA (Quantized LoRA), an extension of LoRA that introduces quantization for enhanced parameter efficiency. The model benefits from 4-bit NormalFloat (NF4) quantization and Double Quantization techniques, ensuring optimized performance.

Performance:

Llama2_13B_startup_Assistant exhibits improved performance and efficiency in addressing queries related to Algerian tax law and startups, making it a valuable resource for individuals and businesses navigating these areas.

Limitations:

While highly specialized, this model may not cover every nuanced aspect of Algerian tax law or the startup ecosystem.
Accuracy may vary depending on the complexity and specificity of questions.
It may not provide legal advice, and users should seek professional consultation for critical legal matters.

Training procedure

The following bitsandbytes quantization config was used during training:

load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float16

Framework versions

PEFT 0.4.0

! huggingface-cli login

from transformers import pipeline
from transformers import AutoTokenizer
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM , BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=getattr(torch, "float16"),
    bnb_4bit_use_double_quant=False)
model = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Llama-2-13b-chat-hf",
        quantization_config=bnb_config,
        device_map={"": 0})
model.config.use_cache = False
model.config.pretraining_tp = 1
model = PeftModel.from_pretrained(model, "TuningAI/Llama2_13B_startup_Assistant")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-13b-chat-hf" , trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
system_message = "Given a user's startup-related question in English, you will generate a thoughtful answer in English."
while 1:
  input_text = input(">>>")
  logging.set_verbosity(logging.CRITICAL)
  prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n {input_text}. [/INST]"
  pipe = pipeline(task="text-generation", model=new_model, tokenizer=tokenizer, max_length=512)
  result = pipe(prompt)
  print(result[0]['generated_text'].replace(prompt, ''))