IPT-125m (WIP)
IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the OSCAR-2301 dataset.
If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞
How to Use
This model is best used with the Hugging Face transformers
library for training and finetuning.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")
Model Description
The architecture is a modification of a standard decoder-only transformer.
Hyperparameter | Value |
---|---|
n_parameters | 125M |
n_layers | 12 |
n_heads | 12 |
d_model | 768 |
vocab size | 50432 |
sequence length | 2048 |