This model has been first pretrained on the BEIR corpus and fine-tuned on the MS MARCO dataset following the approach described in the paper COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning. The associated GitHub repository is available here https://github.com/OpenMatch/COCO-DR.
This model is trained with BERT-large as the backbone with 335M hyperparameters. See the paper https://arxiv.org/abs/2210.15212 for details.
Usage
Pre-trained models can be loaded through the HuggingFace transformers library:
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("OpenMatch/cocodr-large-msmarco")
tokenizer = AutoTokenizer.from_pretrained("OpenMatch/cocodr-large-msmarco")