Thumbs up Gesture & Facial concept training for Stable diffusion

In this excercise, the Stable diffusion 1.5 model was fine-tuned using DreamBooth + LoRA technique. Code development was done through jupyter notebooks running on a desktop computer running RTX 3070. Working model demo was created on Google colab.

Overall approach

Images data corresponding to Thumbs Up gesture, Facial concept & Regularization were retrieved using clip retrival from LAION5B-H-14 index and also Google image search.
Images were pre-processed and filtered for dimension (> 512px * 512px) and visual content (face visibility, hand gestures and hand-elbow consistency). Four pre-trained tflite models from MediaPipe were used for the purpose
Two separate LoRA models corresponding to Facial concept & Regularization were trained using Kohya's GUI for Stable diffusion training
The two LoRA models were initially evaluated separately. Script based evaluation was done for Thumbs up gesture concept and Facial concept learning was assessed based on visual inspection.
The prompts used to generate images were man thumbs up and tcr for gesture and facial concepts resepectively.
Script based evaluation computes the following metrics on a dataset consisting of 100 images generated randomly by the model :
- % of images with face detected
- % of images with wrists matching body
- % of images without excess hands
- % of images passing all three above checks
The two LoRA models and base model were fused to create final for ease of deployment. While a better approach was to deploy LoRA models and base models individually and modify prompts, this could not be attempted due to time constraints. Gesture model was given 2% weight & Facial concept model was given 98% weight by fusing the two together.
The weights were determined through visual examination. After model was fused, 100 randomly generated images for the prompt tcr thumbs up were assessed for image metrics.
The models were pushed to HuggingFace Hub and project demo created using Colab notebook

Notes:

To reduce size of the repo, only the evaluation images have been retained in the repo
'images/eval_images/'

Thumbs up Gesture & Facial concept training for Stable diffusion

Overall approach

NSDT 3DConvert

UnrealSynth

DreamTexture.js