Training Details: 200,000 steps on the Pile Denoising (span corruption) objective Batch size 256, 512 sequence length 25B tokens total Modified GPT NeoX Tokenizer