Thumbs up Gesture & Facial concept training for Stable diffusion
In this excercise, the Stable diffusion 1.5 model was fine-tuned using DreamBooth + LoRA technique. Code development was done through jupyter notebooks running on a desktop computer running RTX 3070. Working model demo was created on Google colab.
Overall approach
- Images data corresponding to Thumbs Up gesture, Facial concept & Regularization were retrieved using clip retrival from LAION5B-H-14 index and also Google image search.
- Images were pre-processed and filtered for dimension (> 512px * 512px) and visual content (face visibility, hand gestures and hand-elbow consistency). Four pre-trained tflite models from MediaPipe were used for the purpose
- Two separate LoRA models corresponding to Facial concept & Regularization were trained using Kohya's GUI for Stable diffusion training
- The two LoRA models were initially evaluated separately. Script based evaluation was done for Thumbs up gesture concept and Facial concept learning was assessed based on visual inspection.
- The prompts used to generate images were
man thumbs up
andtcr
for gesture and facial concepts resepectively. - Script based evaluation computes the following metrics on a dataset consisting of 100 images generated randomly by the model :
- % of images with face detected
- % of images with wrists matching body
- % of images without excess hands
- % of images passing all three above checks
- The two LoRA models and base model were fused to create final for ease of deployment. While a better approach was to deploy LoRA models and base models individually and modify prompts, this could not be attempted due to time constraints. Gesture model was given 2% weight & Facial concept model was given 98% weight by fusing the two together.
- The weights were determined through visual examination. After model was fused, 100 randomly generated images for the prompt
tcr thumbs up
were assessed for image metrics. - The models were pushed to HuggingFace Hub and project demo created using Colab notebook
Notes:
- To reduce size of the repo, only the evaluation images have been retained in the repo
- 'images/eval_images/'