text-to-video stable-diffusion

image/gif

<font size="32">Try Hotshot-XL yourself here: https://www.hotshot.co</font>

Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.

Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things:

  1. You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use.
  2. If you'd like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone's existing LORA usage/workflows :) See more here.

Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. See here for more info.

Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.

Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions here.

image/gif image/jpeg

Source code is available at https://github.com/hotshotco/Hotshot-XL.

Model Description

Limitations and Bias

Limitations

Bias

While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.