BERT trained with YFCC15M with the same capacity with CLIP text encoder Training epochs 32 Valid PPL final: 15.53