This model has been first pretrained on the BEIR corpus and fine-tuned on the MS MARCO dataset following the approach described in the paper COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning. The associated GitHub repository is available here https://github.com/OpenMatch/COCO-DR.

This model is trained with BERT-large as the backbone with 335M hyperparameters. See the paper https://arxiv.org/abs/2210.15212 for details.

Usage

Pre-trained models can be loaded through the HuggingFace transformers library:

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("OpenMatch/cocodr-large-msmarco") 
tokenizer = AutoTokenizer.from_pretrained("OpenMatch/cocodr-large-msmarco")