Musika Audio Autoencoder
Pretrained universal autoencoder model for the Musika system for fast infinite waveform music generation. Introduced in this paper.
Model description
The Musika autoencoder consists of two hierarchical stages that are separately trained. This autoencoder is trained to encode and reconstruct general 44.1 kHz waveform music. The final time compression ratio that is achieved is 4096x. As an example, 23 seconds of 44.1 kHz audio are encoded into a sequence of 256 vectors with a dimension of 64.
How to use
This autoencoder is automatically downloaded and used at the first execution of the system. Try Musika here!
Training data
The autoencoder was trained on both the SXSW dataset (diverse music dataset) and on the VCTK dataset (speech dataset) to produce general representations.