I've converted the DINO checkpoints from the official repo:

You can use it as follows:

from transformers import ViTModel

model = ViTModel.from_pretrained("nielsr/dino_vitb16", add_pooling_layer=False)