This is a mini llama model initialized randomly. It has a single transformer block, and hidden size of 2 in order to facilitate quick iteration during development.

config = LlamaConfig( vocab_size=32000, hidden_size=2, intermediate_size=2, num_hidden_layers=1, num_attention_heads=2, hidden_act="silu", max_position_embeddings=2048, initializer_range=0.02, rms_norm_eps=1e-06, use_cache=True, pad_token_id=0, bos_token_id=1, eos_token_id=2, tie_word_embeddings=False, )