information retrieval multi-lingual dense retriever ANCE japanese

k-ush/xlm-roberta-base-ance-en-jp-warmup

A XLM-RoBERTa-base model trained on mMARCO Japanese dataset with ANCE warmup script. Base checkpoint comes from k-ush/xlm-roberta-base-ance-warmup, so this model was trained both English and Japanese data. I upload checkpoint at 50k steps since MRR@100 at 60k checkpoint was decrease (mrr@100(rerank, full): 0.242, 0.182).

Dataset

I formmated Japanese mMarco dataset for ANCE. Dataset preparetion script is available on github. https://github.com/argonism/JANCE/blob/master/data/gen_jp_data.py

Evaluation Result

Evaluation Result during trainning with mMarco Japanese dev set.

Reranking/Full ranking mrr: 0.24208174148360342/0.19015224905626082