reddit asknyc nyc llama2

nyc-savvy-llama2-7b

Essentials:

Prompt options

Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).

### Human: Post title - post content### Assistant:

For example:

### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.

From QLoRA's Gradio example, it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

Training data

Training script

Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.

git clone https://github.com/artidoro/qlora
cd qlora

pip3 install -r requirements.txt --quiet

python3 qlora.py \
    --model_name_or_path ../llama-2-7b-hf \
    --use_auth \
    --output_dir ../nyc-savvy-llama2-7b \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 500 \
    --save_total_limit 40 \
    --dataloader_num_workers 1 \
    --group_by_length False \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --num_train_epochs 1 \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset /content/gpt_nyc.jsonl \
    --dataset_format oasst1 \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --max_steps 760 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0 \

Merging it back

What you get in the output_dir is an adapter model. Here's ours. Cool, but not as easy to drop into their script.

Two options for merging:

Testing that the model is NYC-savvy

You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the pefttester.py script in this repo:

m = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tok = LlamaTokenizer.from_pretrained(model_name)

messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
messages += "### Assistant: "

input_ids = tok(messages, return_tensors="pt").input_ids

# ...

temperature = 0.7
top_p = 0.9
top_k = 0
repetition_penalty = 1.1

op = m.generate(
    input_ids=input_ids,
    max_new_tokens=100,
    temperature=temperature,
    do_sample=temperature > 0.0,
    top_p=top_p,
    top_k=top_k,
    repetition_penalty=repetition_penalty,
    stopping_criteria=StoppingCriteriaList([stop]),
)
for line in op:
    print(tok.decode(line))