Maybe this is the first ever model trained with TPUs and converted to ๐งจ PyTorch ๐๐ <br/> Trained with google cloud TPUs.
Runtime: 3h 26m 44s
Steps: 18000
Precision: bf16
Learning Rate: 1e-6
Maybe this is the first ever model trained with TPUs and converted to ๐งจ PyTorch ๐๐ <br/> Trained with google cloud TPUs.
Runtime: 3h 26m 44s
Steps: 18000
Precision: bf16
Learning Rate: 1e-6