msmarco t5 pytorch tensorflow pt pt-br

mt5-base Reranker finetuned on mMARCO


mt5-base-mmarco-v1 is a mT5-based model fine-tuned on a multilingual translated version of MS MARCO passage dataset. This dataset, named Multi MS MARCO, is formed by 9 complete MS MARCO passages collection in 9 different languages. In the version v1, the datasets were translated using Helsinki NMT models. Further information about the dataset or the translation method can be found on our paper mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset and mMARCO repository.


from transformers import T5Tokenizer, MT5ForConditionalGeneration

model_name = 'unicamp-dl/mt5-base-mmarco-v1'
tokenizer  = T5Tokenizer.from_pretrained(model_name)
model      = MT5ForConditionalGeneration.from_pretrained(model_name)


If you use mt5-base-mmarco-v1, please cite:

  title={mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset}, 
  author={Luiz Henrique Bonifacio and Vitor Jeronymo and Hugo Queiroz Abonizio and Israel Campiotti and Marzieh Fadaee and  and Roberto Lotufo and Rodrigo Nogueira},
