code

28M个参数: vocab_size=12829 num_hidden_layers=8 num_attention_heads=8 intermediate_size=1024 max_position_embeddings=512 hidden_size=512 block_size=512