LoRA Adapter for LLaMA 13B trained on more datasets than tloen/alpaca-lora-7b

This repo contains a low-rank adapter for LLaMA-13b fit on

Nebulous/gpt4all_pruned
sahil2801/CodeAlpaca-20k
yahma/alpaca-cleaned
datasets part of the OpenAssistant project.

This version of the weights was trained with the following hyperparameters:

Epochs: 2
Batch size: 128
Max Length: 2048
Learning rate: 4e-6
Lora r: 16
Lora Alpha: 32
Lora target modules: q_proj, k_proj, v_proj, o_proj

The model was trained with flash attention and gradient checkpointing.

Model Details

Developed as part of the OpenAssistant Project
Model type: PEFT Adapter for frozen LLaMA
Language: English

Prompting

Input prompt example:

<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>

The input ends with the <|assistant|> token to signal that the model should start generating the assistant reply.

Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:

from typing import List, NamedTuple

import torch
import transformers
from huggingface_hub import hf_hub_download
from peft import PeftModel
from transformers import GenerationConfig

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b")


model = transformers.AutoModelForCausalLM.from_pretrained(
    "decapoda-research/llama-13b-hf", torch_dtype=torch.float16
)  # Load Base Model
model.resize_token_embeddings(
    32016
)  # This model repo also contains several embeddings for special tokens that need to be loaded.

model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id

lora_weights = "jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b"
model = PeftModel.from_pretrained(
    model,
    lora_weights,
    torch_dtype=torch.float16,
)  # Load Lora model

model.eos_token_id = tokenizer.eos_token_id
filename = hf_hub_download("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b", "extra_embeddings.pt")
embed_weights = torch.load(
    filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
)  # Load embeddings for special tokens
model.base_model.model.model.embed_tokens.weight[32000:, :] = embed_weights.to(
    model.base_model.model.model.embed_tokens.weight.dtype
).to(
    device
)  # Add special token embeddings


model = model.half().to(device)
generation_config = GenerationConfig(
    temperature=0.1,
    top_p=0.75,
    top_k=40,
    num_beams=4,
)



def format_system_prompt(prompt, eos_token="</s>"):
    return "{}{}{}{}".format(
        "<|prompter|>",
        prompt,
        eos_token,
        "<|assistant|>"
    )



def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
    prompt = format_system_prompt(prompt)  # OpenAssistant Prompt Format expected
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=max_new_tokens,
            eos_token_id=2,
        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    print("Text generated:")
    print(output)
    return output


generate("What is a meme, and what's the history behind this word?")
generate("What's the Earth total population")
generate("Write a story about future of AI development")

LoRA Adapter for LLaMA 13B trained on more datasets than tloen/alpaca-lora-7b

Model Details

Prompting

Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:

NSDT 3DConvert

UnrealSynth

DreamTexture.js