Skip to main content

DALL·E mini - Generate images from a text prompt

Project description


title: DALL·E mini emoji: 🥑 colorFrom: yellow colorTo: green sdk: streamlit app_file: app/streamlit/app.py pinned: True

DALL·E Mini

Join us on Discord

Generate images from a text prompt

Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".

You can create your own pictures with the demo.

How does it work?

Refer to our report.

Development

Dependencies Installation

For inference only, use pip install git+https://github.com/borisdayma/dalle-mini.git.

For development, clone the repo and use pip install -e ".[dev]". Check style with make style.

Training of VQGAN

The VQGAN was trained using taming-transformers.

We recommend using the latest version available.

Conversion of VQGAN to JAX

Use patil-suraj/vqgan-jax.

Training of Seq2Seq

Use tools/train/train.py.

You can also adjust the sweep configuration file if you need to perform a hyperparameter search.

Inference Pipeline

To generate sample predictions and understand the inference pipeline step by step, refer to tools/inference/inference_pipeline.ipynb.

Open In Colab

FAQ

Where to find the latest models?

Trained models are on 🤗 Model Hub:

Where does the logo come from?

The "armchair in the shape of an avocado" was used by OpenAI when releasing DALL·E to illustrate the model's capabilities. Having successful predictions on this prompt represents a big milestone to us.

Acknowledgements

Authors & Contributors

DALL·E mini was initially developed by:

Many thanks to the people who helped make it better:

Contributing

Join the community on the DALLE-Pytorch Discord. Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!

Citing DALL·E mini

If you find DALL·E mini useful in your research or wish to refer, please use the following BibTeX entry.

@misc{Dayma_DALL·E_Mini_2021,
author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
doi = {10.5281/zenodo.5146400},
month = {7},
title = {DALL·E Mini},
url = {https://github.com/borisdayma/dalle-mini},
year = {2021}
}

References

@misc{ramesh2021zeroshot,
      title={Zero-Shot Text-to-Image Generation}, 
      author={Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
      year={2021},
      eprint={2102.12092},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{esser2021taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2021},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{lewis2019bart,
      title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, 
      author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
      year={2019},
      eprint={1910.13461},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{radford2021learning,
      title={Learning Transferable Visual Models From Natural Language Supervision}, 
      author={Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
      year={2021},
      eprint={2103.00020},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{anil2021scalable,
      title={Scalable Second Order Optimization for Deep Learning},
      author={Rohan Anil and Vineet Gupta and Tomer Koren and Kevin Regan and Yoram Singer},
      year={2021},
      eprint={2002.09018},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalle-mini-0.0.1.dev1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dalle_mini-0.0.1.dev1-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file dalle-mini-0.0.1.dev1.tar.gz.

File metadata

  • Download URL: dalle-mini-0.0.1.dev1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dalle-mini-0.0.1.dev1.tar.gz
Algorithm Hash digest
SHA256 3b999c070469b307baa0b3065de7865936ea51347e46e32e4fea00f13cc6c732
MD5 90cfe97ab4fd348adaabce677f970d02
BLAKE2b-256 63903039af4ec7216f9a4044ec16cc7e4800fcbd69e7b76ab6b569e3fdf12239

See more details on using hashes here.

File details

Details for the file dalle_mini-0.0.1.dev1-py3-none-any.whl.

File metadata

  • Download URL: dalle_mini-0.0.1.dev1-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dalle_mini-0.0.1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 29e4800182ec07a10733c60206dabebad8a4bcf490bfd40c1b5e2c16c1673144
MD5 d6504fd658e26334483e685c6431ce5c
BLAKE2b-256 09d9da65a05805b8725ddcd953d99f5c9be8606a0d42f86c9d428b7a81e1abfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page