text-generation-inference

Original model card

Buy me a coffee if you like this project ;) <a href="https://www.buymeacoffee.com/s3nh"><img src="https://www.buymeacoffee.com/assets/img/guidelines/download-assets-sm-1.svg" alt=""></a>

Description

GGML Format model files for This project.

inference


import ctransformers

from ctransformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(output_dir, ggml_file,
gpu_layers=32, model_type="llama")

manual_input: str = "Tell me about your last dream, please."


llm(manual_input, 
      max_new_tokens=256, 
      temperature=0.9, 
      top_p= 0.7)

<div style="width: 800px; margin: auto;">

<h2>Model Description</h2> <p>“Luna AI Llama2 Uncensored” is a Llama2 based Chat model <br />fine-tuned on over 40,000 long form chat discussions <br /> This model was fine-tuned by Tap, the creator of Luna AI. <br /> The result is an enhanced Llama2 7b model that rivals ChatGPT in performance <br />across a variety of tasks.</p> <p>This model stands out for its long responses, <br /> low hallucination rate, and absence of censorship mechanisms. <br /></p>

<h2>Model Training</h2> <p>The fine-tuning process was performed on an 8x a100 80GB machine. <br />The model was trained almost entirely on synthetic outputs. <br />This includes data from diverse sources which we included to create our custom dataset,<br /> it includes multiple rounds of chats between Human & AI. </p>

<a rel="noopener nofollow" href="https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GPTQ">4bit GPTQ Version provided by @TheBloke - for GPU inference</a><br /> <a rel="noopener nofollow" href="https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML">GGML Version provided by @TheBloke - For CPU inference</a>

<h2>Prompt Format</h2> <p>The model follows the Vicuna 1.1/ OpenChat format:</p>

USER: I have difficulties in making friends, and I really need someone to talk to. Would you be my friend?

ASSISTANT: Of course! Friends are always here for each other. What do you like to do?

<h2>Future Plans</h2> <p>The model is currently being uploaded in FP16 format, <br />and there are plans to convert the model to GGML and GPTQ 4bit quantizations.</p>

<h2>Benchmark Results</h2>

Task Version Metric Value Stderr
arc_challenge 0 acc_norm 0.5512 0.0146
hellaswag 0
mmlu 1 acc_norm 0.46521 0.036
truthfulqa_mc 1 mc2 0.4716 0.0155
Average - - 0.5114 0.0150

<h2>Ethical considerations</h2> <p>The data used to train the model is collected from various sources, mostly from the Web. <br /> As such, it contains offensive, harmful and biased content. <br />We thus expect the model to exhibit such biases from the training data.</p>

<h2>Human life</h2> <p>The model is not intended to inform decisions about matters central to human life, <br />and should not be used in such a way.</p>

<h2>Risks and harms</h2> <p>Risks and harms of large language models include the generation of harmful, offensive or biased content. <br /> These models are often prone to generating incorrect information, sometimes referred to as hallucinations. <br /> We do not expect our model to be an exception in this regard.</p>

</div>