controlnet

Controls image generation by edge maps generated with Edge Drawing. Note that Edge Drawing comes in different flavors: original (ed), parameter free (edpf), color (edcolor).

Edge Drawing Parameter Free

image/png

Clear and pristine! Wooow!

Example

sampler=UniPC steps=20 cfg=7.5 seed=0 batch=9 model: v1-5-pruned-emaonly.safetensors cherry-picked: 1/9

prompt: a detailed high-quality professional photo of swedish woman standing in front of a mirror, dark brown hair, white hat with purple feather

image/png

Canndy Edge for comparison (default in Automatic1111)

image/png

Noise, artifacts and missing edges. Yuck! Ugh!

Image dataset

Training

accelerate launch train_controlnet.py ^
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" ^
  --output_dir="control-edgedrawing-[version]-fp16/" ^
  --dataset_name="mydataset" ^
  --mixed_precision="fp16" ^
  --resolution=512 ^
  --learning_rate=1e-5 ^
  --train_batch_size=1 ^
  --gradient_accumulation_steps=4 ^
  --gradient_checkpointing ^
  --use_8bit_adam ^
  --enable_xformers_memory_efficient_attention ^
  --set_grads_to_none ^
  --seed=0

Evaluation

To evaluate the model it makes sense to compare it with the original Canny model. Original evaluations and comparisons are available at ControlNet 1.0 repo, ControlNet 1.1 repo, ControlNet paper v1, ControlNet paper v2 and Diffusers implementation. Some points we have to keep in mind when comparing canny with edpf in order not to compare apples with oranges:

Versions

Experiment 1 - 2023-09-19 - control-edgedrawing-default-drop50-fp16-checkpoint-40000

Images converted with https://github.com/shaojunluo/EDLinePython (based on original (non-parameter free) edge drawing). Default settings are:

smoothed=False

{ 'ksize'            :  5
, 'sigma'            :  1.0
, 'gradientThreshold': 36
, 'anchorThreshold'  :  8
, 'scanIntervals'    :  1
}

additional arguments: --proportion_empty_prompts=0.5.

Trained for 40000 steps with default settings => results are not good. empty prompts were probably too excessive. retry with no drops and different algorithm parameters.

Update 2023-09-22: bug in algorithm produces too sparse images on default, see https://github.com/shaojunluo/EDLinePython/issues/4

Experiment 2 - 2023-09-20 - control-edgedrawing-default-noisy-drop0-fp16-checkpoint-40000

Same as experiment 1 with smoothed=True and --proportion_empty_prompts=0.

Trained for 40000 steps with default settings => results are not good. conditioning images look too noisy. investigate algorithm.

Experiment 3.0 - 2023-09-22 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000

Conditioning images generated with edpf.py using opencv-contrib-python::ximgproc::EdgeDrawing.

ed     = cv2.ximgproc.createEdgeDrawing()
params = cv2.ximgproc.EdgeDrawing.Params()
params.PFmode = True
ed.setParams(params)
edges    = ed.detectEdges(image)
edge_map = ed.getEdgeImage(edges)

45000 steps => looks good. released as version 0.1 on civitai.

resuming with left-right flipped images.

Experiment 3.1 - 2023-09-24 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-90000

90000 steps (45000 steps on original, 45000 steps with left-right flipped images) => quality became better, might release as 0.2 on civitai.

Experiment 3.2 - 2023-09-24 -control-edgedrawing-cv480edpf-drop0+50-fp16-checkpoint-118000

resumed with epoch 2 from 90000 using --proportion_empty_prompts=0.5 => results became worse, CN didn't pick up on no-prompts (I also tried intermediate checkpoint-104000). restarting with 50% drop.

Experiment 4.0 - 2023-09-25 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-45000

see experiment 3.0. restarted from 0 with --proportion_empty_prompts=0.5 => results are not good, 50% is probably too much for 45k steps. guessmode still doesn't work and tends to produces humans. resuming until 90k with right-left flipped in the hope it will get better with more images.

Experiment 4.1 - 2023-09-26 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-90000

resumed from 45000 steps with left-right flipped images until 90000 steps => results are still not good, 50% is probably also too much for 90k steps. guessmode still doesn't work and tends to produces humans. aborting.

Experiment 5.0 - 2023-09-28 - control-edgedrawing-cv480edpf-fastdup-fp16-checkpoint-45000

see experiment 3. cleaned original images following the fastdup introduction resulting in:

180210 images in total
 67854 duplicates
   644 outliers
    26 too dark
   321 too bright
    57 blurry
 68621 unique removed (that's 38%!)
------
111589 unique images (x2 left-right flip)

restarted from 0 with left-right flipped images and --mixed-precision="no" to create a master release and convert to fp16 afterwards.

Experiment 6.0 - 2023-10-02 - control-edgedrawing-cv480edpf-rect-fp16-checkpoint-45000|90000|135000

see experiment 5.0.

183410 images in total
 75686 duplicates
   381 outliers
    50 too dark
   436 too bright
    31 blurry
 76288 unique removed (that's 42%!)
------
107122 unique images (x2 left-right flip)

1 epoch = 107122 * 2 / 4 = 53561 steps per epoch

restarted from 0 and --mixed-precision="fp16".

TODO: Why did I end up with less images after I added more images? fastdup suddenly finds even more duplicates. Is fastdup default threshold=0.9 too aggressive?

Experiment 6.1 - control-edgedrawing-cv480edpf-rect-fp16-batch32-checkpoint-6696

see experiment 6.0. restarted from 0 with --train_batch_size=2 --gradient_accumulation_steps=16. 1 epoch = 107122 * 2 / 32 = 6696 steps per epoch => released as version 0.2 on civitai.

Experiment 6.2 - control-edgedrawing-cv480edpf-rect-fp16-batch32-drop50-checkpoint-6696

see experiment 6.1. restarted from 0 with --proportion_empty_prompts=0.5.

Ideas

Question and answers

Q: What's the point of another edge control net anyway?

A: 🤷