vision image-classification

Google didn't publish vit-tiny and vit-small model checkpoints in Hugging Face. I converted the weights from the timm repository. This model is used in the same way as ViT-base.

Note that [safetensors] model requires torch 2.0 environment.