stable-diffusion-xl stable-diffusion-xl-diffusers text-to-image diffusers lora

These are LoRA adaption weights for stabilityai/stable-diffusion-xl-base-1.0. The weights were fine-tuned on the lambdalabs/pokemon-blip-captions dataset.

Special VAE used for training: madebyollin/sdxl-vae-fp16-fix.

img_1 img_2 img_3 img_4

🧨 Diffusers Usage

import torch
from diffusers import DiffusionPipeline, AutoencoderKL

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", vae=vae, torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
pipe.load_lora_weights("sshh12/sdxl-lora-pokemon")
pipe.to("cuda")

prompt = "..."

image = pipe(prompt=prompt).images[0]
image

Training

MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
DATASET_NAME="lambdalabs/pokemon-blip-captions"

!accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path="$MODEL_NAME" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="$DATASET_NAME" \
  --caption_column="text" \
  --resolution=1024 \
  --random_flip \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --train_batch_size=1 \
  --gradient_accumulation_steps=8 \
  --num_train_epochs=200 \
  --checkpointing_steps=500 \
  --learning_rate=1e-04 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --seed=0 \
  --validation_prompt="cute dragon creature" \
  --enable_xformers_memory_efficient_attention \
  --report_to="wandb"