NGME-LLama 264M Trained on 4 A6000 for ~4 days Trained ~4 Billion (4 * 16 * 768 * 100_000) Tokens On C4 Corpus