vision-language clip vilt

KiloGram dataset and code repo: https://github.com/lil-lab/kilogram

Preprocessed training and evaluation data: https://huggingface.co/datasets/lil-lab/kilogram-data