sd model trained over the webvid 10m dataset, 5000 images for 50 epochs, useful for video model training.