Core ML Converted SDXL Model:

This model was converted to Core ML for use on Apple Silicon devices. Conversion instructions can be found here.
Provide the model to an app such as Mochi Diffusion Github / Discord to generate images.
original version is only compatible with CPU & GPU option
Resolution and bit size are as noted in the individual file names.
This model requires macOS 14.0 or later to run properly.
This model was converted with a vae-encoder for use with image2image.
Descriptions are posted as-is from original model source.
Not all features and/or results may be available in CoreML format.
This model does not have the unet split into chunks.
This model does not include a safety checker (for NSFW content).
This model can not be used with ControlNet.

SDXL-v10-Base+Refiner:

Source(s): CivitAI

SDXL v1.0 Base + Refiner

The official SDXL base and refiner models converted to Core ML, and packaged as a single combined model.

The base model components will be used for the first 80% of the specified steps. Then the refiner model components will complete the remaining 20% of the steps. This is managed by Mochi Diffusion when it finds this type of combined SDXL model.

The individual model files in this repo have different bit depths and resolutions which are noted in the file names. The nominal model is 16 bit and 1024x1024. Other versions are reduced bits and/or lower resolution.

These are large models and are zipped into smaller parts. Be sure to download all of the parts for a particular model, and combine the pieces into a single folder on your end. The 3 part full size model zip files may not unzip correctly with Apple's native archive tool. They were made with "BetterZip". The 2 part 8 bit model files will unzip with Apple's archive tool.

Model Description

SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") to the latents generated in the first step, using the same prompt.

Developed by: Stability AI
Model type: Diffusion-based text-to-image generative model
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).

Model Sources

Repository: https://github.com/Stability-AI/generative-models
Demo: https://clipdrop.co/stable-diffusion

Uses

The model is intended for research purposes only. Possible research areas and tasks include
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.