stable-diffusion stable-diffusion-diffusers text-to-image diffusers lora

sdxl-wrong-lora

A LoRA for SDXL 1.0 Base which improves output image quality after loading it and using wrong as a negative prompt during inference. You can demo image generation using this LoRA in this Colab Notebook.

The LoRA is also available in a safetensors format for other UIs such as A1111; however this LoRA was created using diffusers and I cannot guarantee its efficacy outside of it.

Benefits of using this LoRA:

Usage

The LoRA can be loaded using load_lora_weights like any other LoRA in diffusers:

import torch
from diffusers import DiffusionPipeline, AutoencoderKL

vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
)
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)

base.load_lora_weights("minimaxir/sdxl-wrong-lora")

_ = base.to("cuda")

During image generation, use wrong as the negative prompt. That's it!

Examples

Left image is the base model output (no LoRA) + refiner, right image is base (w/ LoRA) + refiner + wrong negative prompt. Both generations use the same seed.

I have also released a Colab Notebook to generate these kinds of side-by-side comparison images, although the seeds listed will not give the same results since they were generated on a different GPU/CUDA than the Colab Notebook.

realistic human Shrek blogging at a computer workstation, hyperrealistic award-winning photo for vanity fair (cfg = 13, seed = 56583700)

pepperoni pizza in the shape of a heart, hyperrealistic award-winning professional food photography (cfg = 13, seed = 75789081)

presidential painting of realistic human Spongebob Squarepants wearing a suit, (oil on canvas)+++++ (cfg = 13, seed = 85588026)

San Francisco panorama attacked by (one massive kitten)++++, hyperrealistic award-winning photo by the Associated Press (cfg = 13, seed = 45454868)

hyperrealistic death metal album cover featuring edgy moody realistic (human Super Mario)++, edgy and moody (cfg = 13, seed = 30416580)

Methodology

The methodology and motivation for creating this LoRA is similar to my wrong SD 2.0 textual inversion embedding by training on a balanced variety of undesirable outputs, except trained as a LoRA since textual inversion with SDXL is complicated. The base images were generated from SDXL itself, with some prompt weighting to emphasize undesirable attributes for test images.

You can see the code to generate the wrong images in this Jupyter Notebook.

Notes