TMElyralab/lyraSD - AI Model Zoo

Model Card for lyraSD

We consider the Diffusers as the much more extendable framework for the SD ecosystem. Therefore, we have made a pivot to Diffusers, leading to a complete update of lyraSD.

lyraSD is currently the fastest Stable Diffusion model that can 100% align the outputs of Diffusers available, boasting an inference cost of only 0.52 seconds for a 512x512 image, accelerating the process up to 80% faster than the original version.

Among its main features are:

ControlNet Hot Swap: Can hot swap a ControlNet model weights within 0.6s (0s if cached)
LoRA Hot Swap: Can hot swap a Lora within 0.5s (0.1s if cached)
100% likeness to diffusers output
4 Commonly used Pipelines
- Text2Img
- Img2Img
- ControlNetText2Img
- ControlNetImg2Img
Supported Devices: Any GPU with SM version >= 75. For example, Nvidia Turing architecture(T4), Nvidia Ampere architecture (A2, A10, A16, A30, A40, A100), RTX 4090, 3080 and etc.

Speed

test environment

device: Nvidia A100 40G
img size: 512x512
percision:fp16
steps: 20
sampler: EulerA

Text2Img

model	time cost(ms)
torch2.0.1 + diffusers	~667ms
lyraSD	~528ms

ControlNet-Text2Img

model	time cost(ms)
torch2.0.1 + diffusers	~930ms
lyraSD	~745ms

Model Sources

Checkpoint: https://civitai.com/models/7371/rev-animated
ControlNet: https://huggingface.co/lllyasviel/sd-controlnet-canny
Lora: https://civitai.com/models/18323?modelVersionId=46846

Text2Img Uses

import torch
import time

from lyrasd_model import LyraSdTxt2ImgPipeline

# 存放模型文件的路径，应该包含一下结构：
#   1. clip 模型
#   2. 转换好的优化后的 unet 模型，放入其中的 unet_bins 文件夹
#   3. vae 模型
#   4. scheduler 配置

# LyraSD 的 C++ 编译动态链接库，其中包含 C++ CUDA 计算的细节
lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu11_sm80.so"
model_path = "./models/lyrasd_rev_animated"
lora_path = "./models/lyrasd_xiaorenshu_lora"

# 构建 Txt2Img 的 Pipeline
model = LyraSdTxt2ImgPipeline(model_path, lib_path)

# load lora
# 参数分别为 lora 存放位置，名字，lora 强度，lora模型精度
model.load_lora(lora_path, "xiaorenshu", 0.4, "fp32")

# 准备应用的输入和超参数
prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
height, width = 512, 512
steps = 30
guidance_scale = 7
generator = torch.Generator().manual_seed(123)
num_images = 1

start = time.perf_counter()
# 推理生成
images = model(prompt, height, width, steps,
        guidance_scale, negative_prompt, num_images,
        generator=generator)
print("image gen cost: ",time.perf_counter() - start)
# 存储生成的图片
for i, image in enumerate(images):
    image.save(f"outputs/res_txt2img_lora_{i}.png")

# unload lora，参数为 lora 的名字，是否清除 lora 缓存
# model.unload_lora("xiaorenshu", True)

Demo output

Text2Img

Text2Img without Lora

text2img_demo

Text2Img with Lora

text2img_demo

Img2Img

Img2Img input

Img2Img output

text2img_demo

ControlNet Text2Img

Control Image

text2img_demo

ControlNet Text2Img Output

text2img_demo

Docker Environment Recommendation

For Cuda 11.X: we recommend nvcr.io/nvidia/pytorch:22.12-py3
For Cuda 12.0: we recommend nvcr.io/nvidia/pytorch:23.02-py3

docker pull nvcr.io/nvidia/pytorch:23.02-py3
docker run --rm -it --gpus all -v ./:/lyraSD nvcr.io/nvidia/pytorch:23.02-py3

pip install -r requirements.txt
python txt2img_demo.py

Citation

@Misc{lyraSD_2023,
  author =       {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Bin Wu},
  title =        {lyraSD: Accelerating Stable Diffusion with best flexibility},
  howpublished = {\url{https://huggingface.co/TMElyralab/lyraSD}},
  year =         {2023}
}

Report bug

start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraSD/discussions
report bug with a [bug] mark in the title.