Introduction

This demo application ("demoDiffusion") showcases the acceleration of Stable Diffusion pipeline using TensorRT plugins.

Setup

Clone the TensorRT OSS repository

git clone git@github.com:NVIDIA/TensorRT.git -b release/8.5 --single-branch
cd TensorRT
git submodule update --init --recursive

Launch TensorRT NGC container

Install nvidia-docker using these intructions.

docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/tensorrt:22.10-py3 /bin/bash

(Optional) Install latest TensorRT release

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt

NOTE: Alternatively, you can download and install TensorRT packages from NVIDIA TensorRT Developer Zone.

Build TensorRT plugins library

Build TensorRT Plugins library using the TensorRT OSS build instructions.

export TRT_OSSPATH=/workspace

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_OUT_DIR=$PWD/out
cd plugin
make -j$(nproc)

export PLUGIN_LIBS="$TRT_OSSPATH/build/out/libnvinfer_plugin.so"

Install required packages

cd $TRT_OSSPATH/demo/Diffusion
pip3 install -r requirements.txt

# Create output directories
mkdir -p onnx engine output

NOTE: demoDiffusion has been tested on systems with NVIDIA A100, RTX3090, and RTX4090 GPUs, and the following software configuration.

cuda-python         11.8.1
diffusers           0.7.2
onnx                1.12.0
onnx-graphsurgeon   0.3.25
onnxruntime         1.13.1
polygraphy          0.43.1
tensorrt            8.5.1.7
tokenizers          0.13.2
torch               1.12.0+cu116
transformers        4.24.0

NOTE: optionally install HuggingFace accelerate package for faster and less memory-intense model loading.

Running demoDiffusion

Review usage instructions

python3 demo-diffusion.py --help

HuggingFace user access token

To download the model checkpoints for the Stable Diffusion pipeline, you will need a read access token. See instructions.

export HF_TOKEN=<your access token>

Generate an image guided by a single text prompt

LD_PRELOAD=${PLUGIN_LIBS} python3 demo-diffusion.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v

Restrictions

Upto 16 simultaneous prompts (maximum batch size) per inference.
For generating images of dynamic shapes without rebuilding the engines, use --force-dynamic-shape.
Supports images sizes between 256x256 and 1024x1024.