stable-diffusion-xl stable-diffusion-xl-diffusers text-to-image diffusers controlnet

Small SDXL-controlnet: Canny

These are small controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. This checkpoint is 5x smaller than the original XL controlnet checkpoint. You can find some example images in the following.

prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting images_0)

prompt: a woman, close up, detailed, beautiful, street photography, photorealistic, detailed, Kodak ektar 100, natural, candid shot images_1)

prompt: megatron in an apocalyptic world ground, runied city in the background, photorealistic images_2)

prompt: a couple watching sunset, 4k photo images_3)

Usage

Make sure to first install the libraries:

pip install accelerate transformers safetensors opencv-python diffusers

And then we're ready to go:

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers.utils import load_image
from PIL import Image
import torch
import numpy as np
import cv2

prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"

image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")

controlnet_conditioning_scale = 0.5  # recommended for good generalization

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0-mid",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)

images = pipe(
    prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
).images

images[0].save(f"hug_lab.png")

hug_lab_grid)

To more details, check out the official documentation of StableDiffusionXLControlNetPipeline.

🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨

Training

Our training script was built on top of the official training script that we provide here. You can refer to this script for full discolsure.

Training data

The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.

Compute

One 8xA100 machine

Mixed precision

FP16