cerbero-7b Italian LLM πŸš€

πŸ“’ Cerbero-7b is the first 100% Free and Open Source Italian Large Language Model (LLM) ready to be used for research or commercial applications.

<p align="center"> <img width="300" height="300" src="./README.md.d/cerbero.png"> </p>

Built on mistral-7b, which outperforms Llama2 13B across all benchmarks and surpasses Llama1 34B in numerous metrics.

cerbero-7b is specifically crafted to fill the void in Italy's AI landscape.

A cambrian explosion of Italian Language Models is essential for building advanced AI architectures that can cater to the diverse needs of the population.

cerbero-7b, alongside companions like Camoscio and Fauno, aims to help kick-start this revolution in Italy, ushering in an era where sophisticated AI solutions can seamlessly interact with and understand the intricacies of the Italian language, thereby empowering innovation across industries and fostering a deeper connection between technology and the people it serves.

cerbero-7b is released under the permissive Apache 2.0 license, allowing unrestricted usage, even for commercial applications.

Why Cerbero? πŸ€”

The name "Cerbero," inspired by the three-headed dog that guards the gates of the Underworld in Greek mythology, encapsulates the essence of our model, drawing strength from three foundational pillars:

Training Details πŸš€

cerbero-7b is fully fine-tuned, distinguishing itself from LORA or QLORA fine-tunes. The model is trained on an expansive Italian Large Language Model (LLM) using synthetic datasets generated through dynamic self-chat on a large context window of 8192 tokens

Dataset Composition πŸ“Š

We employed a refined version of the Fauno training dataset. The training data covers a broad spectrum, incorporating:

Training Setup βš™οΈ

cerbero-7b is trained on an NVIDIA DGX H100:

The model has been trained for 3 epochs, ensuring a convergence of knowledge and proficiency in handling diverse linguistic tasks.

Getting Started πŸš€

You can load cerbero-7b using πŸ€—transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("galatolo/cerbero-7b")
tokenizer = AutoTokenizer.from_pretrained("galatolo/cerbero-7b")

prompt = """Questa Γ¨ una conversazione tra un umano ed un assistente AI.
[|Umano|] Come posso distinguere un AI da un umano?
[|AI|]"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids
with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=128)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)