stable-diffusion stable-diffusion-diffusers text-to-speech

Riffusion

Riffusion is an app for real-time music generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This repository contains the model files, including:

Riffusion v1 Model

Riffusion is a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be converted into audio clips.

The model was created by Seth Forsgren and Hayk Martiros as a hobby project.

You can use the Riffusion model directly, or try the Riffusion web app.

The Riffusion model was created by fine-tuning the Stable-Diffusion-v1-5 checkpoint. Read about Stable Diffusion here 🤗's Stable Diffusion blog.

Model Details

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Citation

If you build on this work, please cite it as follows:

@software{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}