414M tokens

  1. 73M hy wikipedia
  2. 341M arlis database

74951 unique words

3-5 ngrams

5 window length

300 embedding dim

skipgram

minimum number of words 150

100 epochs, 0.05 start lr

26 hours on 20 xeon gold cores

How to use

import fasttext

model = fasttext.load_model('output.bin')

model.get_nearest_neighbors('զենքեր')