Model Card for Model ID

This model is vit-base-patch16-224-in21k fine-tuned with a subset of the 2021 Kaggle Google Landmark Dataset competition, including only the top 51 categories. The dataset is available as Hugginface dataset on: https://huggingface.co/datasets/pemujo/GLDv2_Top_51_Categories

Developed by: Pedro Melendez
Model type: Vision transformer
Finetuned from model: vit-base-patch16-224-in21k

Training Data

Classes with more than 500 images in the 2021 Kaggle Google Landmark competition https://huggingface.co/datasets/pemujo/GLDv2_Top_51_Categories

Results


epoch	4
eval_accuracy	0.97411
eval_loss	0.11560
eval_runtime (secs)	79.0939
eval_samples_per_second	115.255
eval_steps_per_second	14.413
train_runtime (secs)	4082.92
train_samples_per_second	35.722
train_steps_per_second	2.233

Environmental Impact

Hardware Type: Nvidia V100
Minutes used: 68 Minutes
Cloud Provider: Google Cloud
Compute Region: us-central

Compute Infrastructure

Google Cloud Workbench Instance

Hardware

GCP Workbench n1-highmem-8 instance with Nvidia V100 GPU

Software

Python 3.9 Pytorch 2.0.1+cu117