Model Card for player1537/Dolphinette

Dolphinette is my latest attempt at creating a small LLM that is intended to run locally on ones own laptop or cell phone. I believe that the area of personalized LLMs will be one of the largest driving forces towards widespread LLM usage.

Dolphinette is a fine-tuned version of bigscience/bloom-560m, trained using the ehartford/dolphin dataset. The model was trained as a LoRA using this Google Colab notebook and then the LoRA was merged into the original model using this Google Colab notebook.

Uses

Dolphinette is trained to follow instructions and uses the following template:

<s>INSTRUCTION: You are an AI assistant that follows instruction extremely well. Help as much as you can. INPUT: Answer this question: what is the capital of France? OUTPUT:

More formally, this function was used:

def __text(datum: Dict[Any, Any]=None, /, **kwargs) -> str:
    r"""

    >>> __text({
    ...   "instruction": "Test instruction.",
    ...   "input": "Test input.",
    ...   "output": "Test output.",
    ... })
    '<s>INSTRUCTION: Test instruction. INPUT: Test input. OUTPUT: Test output.</s>'

    >>> __text({
    ...   "instruction": "Test instruction.",
    ...   "input": "Test input.",
    ...   "output": None,
    ... })
    '<s>INSTRUCTION: Test instruction. INPUT: Test input. OUTPUT:'

    """

    if datum is None:
        datum = kwargs

    return (
        f"""<s>"""
        f"""INSTRUCTION: {datum['instruction']} """
        f"""INPUT: {datum['input']} """
        f"""OUTPUT: {datum['output']}</s>"""
    ) if datum.get('output', None) is not None else (
        f"""<s>"""
        f"""INSTRUCTION: {datum['instruction']} """
        f"""INPUT: {datum['input']} """
        f"""OUTPUT:"""
    )

From the original training set, the set of instructions and how many times they appeared is as follows.

Direct Use

Using the huggingface transformers library, you can use this model simply as:

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained(
    'player1537/Dolphinette',
)

tokenizer = transformers.AutoTokenizer.from_pretrained(
    'player1537/Dolphinette',
)

pipeline = transformers.pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
)

completion = pipeline(
    (
        r"""<s>INSTRUCTION: You are an AI assistant that helps people find"""
        r"""information. INPUT: Answer this question: what is the capital of"""
        r"""France? Be concise. OUTPUT:"""
    ),
    return_full_text=False,
    max_new_tokens=512,
)
completion = completion[0]['generated_text']

print(completion)
#=>  The capital of France is the city of Paris. It's located in the country of
#=>  France, which means it's a geographical location in Europe. It is
#=>  consistently called "La capitale de France" ("La capital de la France"),
#=>  its localization literally refers to theThiest city of France.
#=>  
#=> According to the English translation of the French, the capital is the place
#=> where people live for their livelihood or business. However, the actual
#=> location you are looking at is the capital of France, the city located in
#=> the center of the country along several important international routes.
#=>  
#=> The capital of France generally refers to one or a few urban locations that
#=> represent particular cities in Europe. Depending on your nationality or
#=> culture, refinements can be added to the name of the city, and the
#=> announcement can be 'tel Aviv', 'Edinburgh', 'Corinthus', 'Palace of Culture
#=> and Imperials' (a French title), 'Languedoc', `Paris' or 'Belfast'.
#=>  
#=> To be clear, the city of paris is the capital of France, and it is the
#=> geographical location of the city, not the city itself.
#=>  
#=> Conclusion: The capital of France is the city of Paris, which is the
#=> most-visited international destination in Europe.

This model is very wordy... But for less contrived tasks, I have found it to work well enough.