text-to-image

DALL·E Mini Model Card

This model card focuses on the model associated with the DALL·E mini space on Hugging Face, available here. The app is called “dalle-mini”, but incorporates “DALL·E Mini’’ and “DALL·E Mega” models (further details on this distinction forthcoming).

The DALL·E Mega model is the largest version of DALLE Mini. For more information specific to DALL·E Mega, see the DALL·E Mega model card.

Model Details

@misc{Dayma_DALL·E_Mini_2021,
      author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
      doi = {10.5281/zenodo.5146400},
      month = {7},
      title = {DALL·E Mini},
      url = {https://github.com/borisdayma/dalle-mini},
      year = {2021}
}

Uses

Direct Use

The model is intended to be used to generate images based on text prompts for research and personal consumption. Intended uses include supporting creativity, creating humorous content, and providing generations for people curious about the model’s behavior. Intended uses exclude those described in the Misuse and Out-of-Scope Use section.

Downstream Use

The model could also be used for downstream use cases, including:

Downstream uses exclude the uses described in Misuse and Out-of-Scope Use.

Misuse, Malicious Use, and Out-of-Scope Use

The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Misuse and Malicious Use

Using the model to generate content that is cruel to individuals is a misuse of this model. This includes:

Limitations and Bias

Limitations

The model developers discuss the limitations of the model further in the DALL·E Mini technical report:

Bias

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

The model was trained on unfiltered data from the Internet, limited to pictures with English descriptions. Text and images from communities and cultures using other languages were not utilized. This affects all output of the model, with white and Western culture asserted as a default, and the model’s ability to generate content using non-English prompts is observably lower quality than prompts in English.

While the capabilities of image generation models are impressive, they may also reinforce or exacerbate societal biases. The extent and nature of the biases of DALL·E Mini and DALL·E Mega models have yet to be fully documented, but initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups. Work to analyze the nature and extent of the models’ biases and limitations is ongoing.

Our current analyses demonstrate that:

The technical report discusses these issues in more detail, and also highlights potential sources of bias in the model development process.

Limitations and Bias Recommendations

Training

Training Data

The model developers used 3 datasets for the model:

For fine-tuning the image encoder, a subset of 2 million images were used. All images (about 15 million) were used for training the Seq2Seq model.

Training Procedure

As described further in the technical report for DALL·E Mini, during training, images and descriptions are both available and pass through the system as follows:

The simplified training procedure for DALL·E Mega is as follows:

There is more information about the full procedure and technical material in the DALL·E Mega training journal.

Evaluation Results

The model developers discuss their results extensively in their technical report for DALL·E Mini, which provides comparisons between DALL·E Mini’s results with DALL·E-pytorch, OpenAI’s DALL·E, and models consisting of a generator coupled with the CLIP neural network model.

For evaluation results related to DALL·E Mega, see this technical report.

Environmental Impact

DALL·E Mini Estimated Emissions

The model is 27 times smaller than the original DALL·E and was trained on a single TPU v3-8 for only 3 days.

Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

DALL·E Mega Estimated Emissions

DALL·E Mega is still training. So far, as on June 9, 2022, the model developers report that DALL·E Mega has been training for about 40-45 days on a TPU v3-256. Using those numbers, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

Citation

@misc{Dayma_DALL·E_Mini_2021,
      author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
      doi = {10.5281/zenodo.5146400},
      month = {7},
      title = {DALL·E Mini},
      url = {https://github.com/borisdayma/dalle-mini},
      year = {2021}
}

This model card was written by: Boris Dayma, Margaret Mitchell, Ezi Ozoani, Marissa Gerchick, Irene Solaiman, Clémentine Fourrier, Sasha Luccioni, Emily Witko, Nazneen Rajani, and Julian Herrera.