Text-to-Video

model example

zeroscope_v2 567w

A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output. This model was trained using 9,923 clips and 29,769 tagged frames at 24 frames, 576x320 resolution.<br /> zeroscope_v2_567w is specifically designed for upscaling with zeroscope_v2_XL using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in zeroscope_v2_XL, permitting faster exploration in 576x320 before transitioning to a high-resolution render. See some example outputs that have been upscaled to 1024x576 using zeroscope_v2_XL. (courtesy of dotsimulate)<br />

zeroscope_v2_576w uses 7.9gb of vram when rendering 30 frames at 576x320

Using it with the 1111 text2video extension

  1. Download files in the zs2_576w folder.
  2. Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.

Upscaling recommendations

For upscaling, it's recommended to use zeroscope_v2_XL via vid2vid in the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip. <br />

Known issues

Lower resolutions or fewer frames could lead to suboptimal output. <br />

Thanks to camenduru, kabachuha, ExponentialML, dotsimulate, VANYA, polyware, tin2tin<br />