<h1 style="text-align: center">LLmRa-2.7B</h1> <h2 style="text-align: center">A conversational Open Pre-trained Transformer Language Model fine-tune.</h2>
LLmRa 2.7B, as a proof-of-concept fine-tune of facebook/opt-2.7b optimized for dialogue.
Disclaimer: NSFW data was included in the fine-tuning of this model. Although SFW inputs will usually result in SFW outputs, you are advised to chat at your own risk. This model is not suitable for use by minors.
Warning: This model is NOT suitable for use by minors. It will output X-rated content under certain circumstances.
This model is fine-tuned on a small-testing dataset, version 2 or a higher parameter model will contain the full dataset.
Usage Format
To effectively utilize the model, follow this structured format for engaging text-based conversations:
1. Initialization
Here is how you can define the personality of the language model:
<|system|>[Persona]
- Persona: You can define a specific persona or context for the AI, but it's optional. It can be a character, a role, or just a style of interaction.
2. AI Introduction
<|user|>[User input]<|model|>
- Users can start the conversation by entering their message within
<|user|>
and closing with<|model|>
.
Example Usage:
Here's an example of how to start a conversation with the AI:
<|system|>I'm here to provide information and assistance on a wide range of topics.
<|model|>Hello! Welcome to our AI-powered assistant. How can I assist you today?
<|user|>Tell me about the history of artificial intelligence.
<|model|>
Continue the conversation as needed. This structured format helps maintain a smooth and engaging interaction with the AI.
You are not required to include User
, you can change it to your prefered name or leave it blank You may also add the AI name, example:
<|user|>YourNameHere: Hello.<|model|>CharacterName:
You can also use this instruct prompt example:
<|system|>What is one plus one?<|model|>
Loading The Model
To use the model and interact with it, use the Python code below:
from transformers import (AutoModelForCausalLM,
AutoTokenizer,
pipeline,
)
model = AutoModelForCausalLM.from_pretrained('L-R/LLmRa-2.7B')
tokenizer = AutoTokenizer.from_pretrained('L-R/LLmRa-2.7B')
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=100)
input_question = 'QUESTION HERE'
question_formatted = f'<|system|>{input_question}<|model|>'
result = pipe(question_formatted)
print(f"[model]: {result[0]['generated_text'][len(question_formatted):]}")
Or the more complex one:
import os
import random
import sys
import time
import json
import torch
from transformers import (AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
set_seed)
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
local_tokenizer = bool(os.getenv('TOKENIZERS_PARALLELISM', 'false'))
class Chatbot:
def __init__(self, config):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.tokenizer = None
self.config = config
self.persona = None
self.model = None
self.history = []
self.load_model()
def create_persona(self, persona_data):
required_keys = ['name', 'description', 'greeting']
if not all(key in persona_data for key in required_keys):
raise ValueError(
"Missing required keys in persona_data. Please provide 'name', 'description', and 'greeting'.")
new_persona_id = str(max(int(key) for key in self.config["personas"].keys()) + 1)
self.config["personas"][new_persona_id] = persona_data
return new_persona_id
def load_model(self):
model_path = self.config["model_path"]
tokenizer_path = self.config["tokenizer_path"]
quantization_config = BitsAndBytesConfig(
load_in_4bit= self.config['load_model_4bit'],
bnb_4bit_quant_type='nf4' if self.config['load_model_4bit'] else None,
bnb_4bit_compute_dtype=torch.float16 if self.config['load_model_4bit'] else None,
bnb_4bit_use_double_quant=True if self.config['load_model_4bit'] else None,
load_in_8bit=self.config['load_model_8bit'],
bnb_8bit_quant_type='nf4' if self.config['load_model_8bit'] else None,
bnb_8bit_compute_dtype=torch.float16 if self.config['load_model_8bit'] else None,
bnb_8bit_use_double_quant=True if self.config['load_model_8bit'] else None,
)
if not model_path or not tokenizer_path:
raise ValueError('model_name or tokenizer_path name not found! Define one.')
if self.config['load_model_4bit'] and self.config['load_model_8bit']:
raise ValueError("You can't load the model in 8 bits and 4 bits at the same time!")
if not self.config['user_name']:
print('You have not selected a name! No name will be send to the model.')
print(f"\nLoading model: {model_path}")
if torch.cuda.is_available():
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
use_auth_token=self.config['model_token'],
quantization_config=quantization_config,)
if torch.cuda.device_count() > 1:
self.model = torch.nn.DataParallel(self.model)
model_running_on = f'{torch.cuda.device_count()} GPUs'
else:
model_running_on = '1 GPU'
else:
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=quantization_config,
use_auth_token=self.config['model_token']).to(
self.device
)
model_running_on = 'CPU'
print(f'Model is running on: {model_running_on}')
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=self.config['model_token'])
print(self.tokenizer)
def load_persona(self, persona_id):
personas = self.config["personas"]
if persona_id in personas:
self.persona = personas[persona_id]
else:
raise ValueError("Invalid persona ID")
def formatting_question(self, user_input, history):
config_user = self.config['use_names']['user']
config_model = self.config['use_names']['model']
config_question = self.config['use_question_template']
if config_question:
formatted_answer = (
f'<|system|>{user_input}<|model|>'
)
else:
m_ = self.persona["description"]
g_ = self.persona["greeting"]
n_ = self.persona["name"]
un_ = self.config["user_name"]
if config_user and config_model:
formatted_answer = (
f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{un_}: {user_input}<|model|>{n_}:'
)
elif config_user:
formatted_answer = (
f'<|system|>{m_}<|model|>{g_}{history}<|user|>{un_}: {user_input}<|model|>'
)
elif config_model:
formatted_answer = (
f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{user_input}<|model|>{n_}:'
)
else:
formatted_answer = (
f'<|system|>{m_}<|model|>{g_}{history}<|user|>{user_input}<|model|>'
)
return formatted_answer
def history_formatting(self, last_input, last_output):
config_user = self.config['use_names']['user']
config_model = self.config['use_names']['model']
n_ = self.persona["name"]
un_ = self.config["user_name"]
if config_user and config_model:
formatted_answer = (
f'<|user|>{un_}: {last_input}<|model|>{n_}: {last_output}'
)
elif config_user:
formatted_answer = (
f'<|user|>{un_}: {last_input}<|model|>{last_output}'
)
elif config_model:
formatted_answer = (
f'<|user|>{last_input}<|model|>{n_}: {last_output}'
)
else:
formatted_answer = (
f'<|user|>{last_input}<|model|>{last_output}'
)
return formatted_answer
def reply(self, user_input):
config_question = self.config['use_question_template']
set_seed(random.randint(1, 1000))
user_input = " ".join(user_input.split())
if len(self.history) > self.config["history_length"]:
model_history = "\n".join([str(item) for item in self.history[-self.config["history_length"]:]])
else:
model_history = "\n".join([str(item) for item in self.history])
input_ai = self.formatting_question(user_input, model_history).strip()
tokenized_input_ai = self.tokenizer.encode(input_ai, return_tensors="pt")
output_ids = self.model.generate(
max_length=self.config["max_generation_length"] + len(tokenized_input_ai[0]),
no_repeat_ngram_size=self.config["no_repeat_ngram_size"],
repetition_penalty=self.config["repetition_penalty"],
length_penalty=self.config["length_penalty"],
input_ids=tokenized_input_ai.to(self.device),
pad_token_id=self.tokenizer.eos_token_id,
temperature=self.config["temperature"],
top_k=self.config["top_k"],
top_p=self.config["top_p"],
early_stopping=True,
use_cache=True,
do_sample=True,
)
ai_reply = self.tokenizer.decode(
output_ids[0],
skip_special_tokens=False)[len(input_ai)+4:]
if not config_question:
self.history.append(self.history_formatting(user_input, ai_reply))
return ai_reply.strip()
def reset_conversation(self):
self.history = []
class UserInterface:
def __init__(self, chatbot):
self.chatbot = chatbot
def run(self):
persona_id = self.chatbot.config["default_persona"]
self.chatbot.load_persona(persona_id)
print("\nChosen Persona:", self.chatbot.persona["name"])
print("Your Chosen Name:", self.chatbot.config["user_name"])
print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
while True:
user_input = input(f"\n>> {self.chatbot.config['user_name']}: ")
if user_input.lower() == "reset_app" or user_input == "reset_app":
self.chatbot.reset_conversation()
print("\nConversation history has been reset.\n")
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
print(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
continue
if user_input.lower().startswith("create_persona"):
# Example of use: create_persona
# {"name": "CustomPersona",
# "description": "This is a custom persona created by the user.",
# "greeting": "Hello! I am CustomPersona, nice to meet you!"}
try:
persona_data = json.loads(' '.join(user_input.split()[1:]))
new_persona_id = self.chatbot.create_persona(persona_data)
print(f"Persona created with ID: {new_persona_id}")
except json.JSONDecodeError:
print("Invalid JSON input. Please provide a valid JSON string containing 'name', 'description', and 'greeting'.")
except ValueError as e:
print(e)
# Add a command to change the persona
if user_input.lower().startswith("change_persona"):
try:
new_persona_id = user_input.split()[1]
self.chatbot.load_persona(new_persona_id)
self.chatbot.reset_conversation()
print("\nPersona changed to:", self.chatbot.persona["name"])
print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
continue
except (IndexError, ValueError):
print("Invalid command or persona ID. Please use 'change_persona [ID]'.")
continue
if user_input.lower() == "exit_app" or user_input == "exit_app":
print("Goodbye!")
break
reply = self.chatbot.reply(user_input)
def typewriter_effect(sentence, type_delay):
for char in sentence:
sys.stdout.write(char)
sys.stdout.flush()
time.sleep(type_delay)
reply_length = len(reply)
type_delay_ranges = {
(100, 200): 0.03,
(200, 300): 0.02,
(300, 400): 0.01,
(400, 500): 0.005
}
default_type_delay = 0.04
for length_range, delay in type_delay_ranges.items():
if length_range[0] < reply_length <= length_range[1]:
type_delay = delay
break
else:
type_delay = default_type_delay
if self.chatbot.config['use_typing_effect']:
typewriter_effect(f'{self.chatbot.persona["name"]}: {reply}', type_delay)
else:
print(f'{self.chatbot.persona["name"]}: {reply}')
def main():
config = {
"user_name": "Jack", # The user's name, which is set to "Jack" in this case.
"model_path": "L-R/LLmRa-2.7B", # Path to the model used for generating responses.
"tokenizer_path": "L-R/LLmRa-2.7B", # Path to the tokenizer associated with the model.
"model_token": None, # If you want to load the model using your huggingface token. (Not required, but included)
"load_model_4bit": True, # Whether to load the model with 4-bit precision.
"load_model_8bit": False, # Whether to load the model with 8-bit precision.
"use_typing_effect": True, # Whether to simulate a typing effect when displaying responses.
"use_names": {
"model": False, # Whether the model's name should be used in question formatting.
"user": False, # Whether the user's name should be used in question formatting.
},
"use_question_template": False, # Whether to use predefined question templates in conversations.
"personas": {
# A dictionary of personas with their descriptions and greetings for use in conversations.
"1": {
"name": "LLmRa",
"description": "Description of the LLmRa persona. It provides background and characteristics of the persona.",
"greeting": "The greeting message when the LLmRa persona is active in a conversation."
},
"2": {
"name": "Hikari",
"description": "Description of the Hikari persona. It provides background and characteristics of the persona.",
"greeting": "The greeting message when the Hikari persona is active in a conversation."
}
},
"max_generation_length": 450, # The maximum length for generated responses.
"default_persona": "1", # The default persona to use when starting a conversation.
"history_length": 6, # The maximum number of previous messages to consider in the conversation history.
"top_k": 40, # Top-k sampling parameter for text generation.
"top_p": .55, # Top-p sampling parameter for text generation.
"temperature": .55, # Temperature parameter for controlling the randomness of generated text.
"length_penalty": 0.65, # Penalty factor for generating longer or shorter responses.
"no_repeat_ngram_size": 4, # Parameter to avoid repeating n-grams in generated text.
"repetition_penalty": 1.25, # Penalty factor for avoiding repeated phrases in generated text.
}
# Initialize chatbot and user interface
chatbot = Chatbot(config)
ui = UserInterface(chatbot)
# Run the user interface
ui.run()
if __name__ == "__main__":
main()
Known issues
Model doesn't some of the times follow instructions.