This unofficial repository hosts a diffusers-compatible float16 checkpoint of the WDXL base UNet.

For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):

SDXL 0.9
- tokenizers
- text encoders
- scheduler config
madebyollin's fp16 VAE

Usage (diffusers)

StableDiffusionXLPipeline

Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  subfolder='scheduler',
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  scheduler=scheduler,
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16'
)
pipe.to('cuda')

# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)

prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'

out: StableDiffusionXLPipelineOutput = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  num_inference_steps=25,
  guidance_scale=12.,
  original_size=(4096, 4096),
  target_size=(1024, 1024),
  height=1024,
  width=1024,
  generator=Generator().manual_seed(48),
)

images: List[Image.Image] = out.images
img, *_ = images

img.save('waifu.png')

You should get a picture like this:

UNet2DConditionModel

If you just want the UNet, you can load it like so:

import torch
from diffusers import UNet2DConditionModel

base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16',
  subfolder='unet',
).eval().to(torch.device('cuda'))

How it was converted

I used Kohya's converter script, to convert the official (hakurei/waifu-diffusion-xl) wdxl-aesthetic-0.9.safetensors. See this commit.

I forked kohya's converter script, making one for SDXL.

I invoked it like so:

python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers

NOTE: The work here is a Work in Progress! Nothing in this repository is final.

waifu-diffusion-xl - Diffusion for Rich Weebs

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.

<sub>masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck</sub>

Model Description(s)

wdxl-aesthetic-0.9 is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's SDXL 0.9 checkpoint as the base model for finetuning.

License

This model has been released under the SDXL 0.9 RESEARCH LICENSE AGREEMENT due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.

Downstream Uses

This model can be used for entertainment purposes and as a generative art assistant.

Team Members and Acknowledgements

This project would not have been possible without the incredible work by Stability AI and Novel AI.

In order to reach us, you can join our Discord server.