conversational

#12 epochs, each batch size 4, gradient accumulation steps 1, tail 4096.

#THIS SEEMS TO BE THE OPTIMAL SETUP.