Model Description
The model is fine-tuned from openai's ViT-L-14 using PMC_OA_beta and roco's data sets, using the tool open_clip(https://github.com/mlfoundations/open_clip).
Training
python -m training.main \
--save-frequency 2 \
--zeroshot-frequency 1 \
--report-to tensorboard \
--train-data="/home/data1/ryanyip/huggingface-models/pmc_oa_beta/train.csv" \
--val-data="/home/data1/ryanyip/huggingface-models/pmc_oa_beta/sample_valid.csv" \
--csv-separator "," \
--csv-img-key image \
--csv-caption-key caption \
--warmup 10000 \
--batch-size=128 \
--lr=1e-5 \
--wd=0.2 \
--epochs=30 \
--workers=8 \
--model "ViT-L-14" \
--name "pmc_vit_l_14" \
--pretrained "ViT-L-14_state_dict.pt" \
--save-most-recent
ViT-L-14_state_dict.pt is the pretrained weight from openai/ViT-L-14