KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA)
This model is a large multimodal model (LMM) that combines the LLM(KoVicuna) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset.
Detail codes are available at KoLLaVA github repository
Training hyperparameters
- learning rate : 2e-5
- train_batch_size: 16
- distributed_type: multi-GPU (A100 80G)
- num_devices: 4
- gradient_accumulation_steps: 1
- total_train_batch_size: 64
- total_eval_batch_size: 16
- lr_scheduler_type: cosine
- num_epochs: 1
Model License: Apache License 2.0