ipt-125m

IPT-125m (WIP)

IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the OSCAR-2301 dataset.

If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞 Buy me a coffee

How to Use

This model is best used with the Hugging Face transformers library for training and finetuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")

Model Description

The architecture is a modification of a standard decoder-only transformer.

Hyperparameter Value
n_parameters 125M
n_layers 12
n_heads 12
d_model 768
vocab size 50432
sequence length 2048