This version is trained on 3 epochs on the full dataset without wikt & wn.