AI imagined images. Pythonic generation of stable diffusion images.
Project description
ImaginAIry ๐ค๐ง
AI imagined images. Pythonic generation of stable diffusion images.
"just works" on Linux and macOS(M1) (and maybe windows?).
Examples
# on macOS, make sure rust is installed first
>> pip install imaginairy
>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman"
Console Output
๐ค๐ง received 4 prompt(s) and will repeat them 1 times to create 4 images.
Loading model onto mps backend...
Generating ๐ผ : "a scenic landscape" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 40/40 [00:29<00:00, 1.36it/s]
๐ผ saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
Generating ๐ผ : "a photo of a dog" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 40/40 [00:28<00:00, 1.41it/s]
๐ผ saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
Generating ๐ผ : "photo of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 40/40 [00:28<00:00, 1.40it/s]
๐ผ saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
Generating ๐ผ : "portrait photo of a freckled woman" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 40/40 [00:29<00:00, 1.37it/s]
๐ผ saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg
Automated Replacement (txt2mask) by clipseg
>> imagine --init-image pearl_earring.jpg --mask-prompt face --mask-mode keep --init-image-strength .4 "a female doctor" "an elegant woman"
โก๏ธ
>> imagine --init-image fruit-bowl.jpg --mask-prompt fruit --mask-mode replace --init-image-strength .1 "a bowl of pears" "a bowl of gold" "a bowl of popcorn" "a bowl of spaghetti"
โก๏ธ
Face Enhancement by CodeFormer
>> imagine "a couple smiling" --steps 40 --seed 1 --fix-faces
โก๏ธ
Upscaling by RealESRGAN
>> imagine "colorful smoke" --steps 40 --upscale
โก๏ธ
Tiled Images
>> imagine "gold coins" "a lush forest" "piles of old books" leaves --tile
Image-to-Image
>> imagine "portrait of a smiling lady. oil painting" --init-image girl_with_a_pearl_earring.jpg
โก๏ธ
Generate image captions
>> aimg describe assets/mask_examples/bowl001.jpg
a bowl full of gold bars sitting on a table
Features
- It makes images from text descriptions! ๐
- Generate images either in code or from command line.
- It just works. Proper requirements are installed. model weights are automatically downloaded. No huggingface account needed. (if you have the right hardware... and aren't on windows)
- No more distorted faces!
- Noisy logs are gone (which was surprisingly hard to accomplish)
- WeightedPrompts let you smash together separate prompts (cat-dog)
- Tile Mode creates tileable images
- Prompt metadata saved into image file metadata
- Edit images by describing the part you want edited (see example above)
- Have AI generate captions for images
aimg describe <filename-or-url>
How To
For full command line instructions run aimg --help
from imaginairy import imagine, imagine_image_files, ImaginePrompt, WeightedPrompt, LazyLoadingImage
url = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg/540px-Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg"
prompts = [
ImaginePrompt("a scenic landscape", seed=1),
ImaginePrompt("a bowl of fruit"),
ImaginePrompt([
WeightedPrompt("cat", weight=1),
WeightedPrompt("dog", weight=1),
]),
ImaginePrompt(
"a spacious building",
init_image=LazyLoadingImage(url=url)
),
ImaginePrompt(
"a bowl of strawberries",
init_image=LazyLoadingImage(filepath="mypath/to/bowl_of_fruit.jpg"),
mask_prompt="fruit|stems",
mask_mode="replace",
mask_expansion=3
)
]
for result in imagine(prompts):
# do something
result.save("my_image.jpg")
# or
imagine_image_files(prompts, outdir="./my-art")
Requirements
- ~10 gb space for models to download
- A decent computer with either a CUDA supported graphics card or M1 processor.
- Python installed. Preferably Python 3.10.
- For macOS rust must be installed
to compile the
tokenizer
library. be installed via:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Running in Docker
See example Dockerfile (works on machine where you can pass the gpu into the container)
docker build . -t imaginairy
# you really want to map the cache or you end up wasting a lot of time and space redownloading the model weights
docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -v $HOME/.cache/torch:/root/.cache/torch -v `pwd`/outputs:/outputs imaginairy /bin/bash
ChangeLog
1.5.0
- img2img now supported with PLMS (instead of just DDIM)
- added image captioning feature
aimg describe dog.jpg
=>a brown dog sitting on grass
- added new commandline tool
aimg
for additional image manipulation functionality
1.4.0
- support multiple additive targets for masking with
|
symbol. Example: "fruit|stem|fruit stem"
1.3.0
- added prompt based image editing. Example: "fruit => gold coins"
- test coverage improved
1.2.0
- allow urls as init-images
** previous **
- img2img actually does # of steps you specify
- performance optimizations
- numerous other changes
Models Used
- CLIP - https://openai.com/blog/clip/
- LDM - Latent Diffusion
- Stable Diffusion
Not Supported
- a web interface. this is a python library
- training
Todo
- performance optimizations
- โ https://github.com/huggingface/diffusers/blob/main/docs/source/optimization/fp16.mdx
- โ https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#
- โ https://www.reddit.com/r/StableDiffusion/comments/xalaws/test_update_for_less_memory_usage_and_higher/
- https://github.com/neonsecret/stable-diffusion https://github.com/CompVis/stable-diffusion/pull/177
- https://github.com/huggingface/diffusers/pull/532/files
- โ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- Development Environment
- โ add tests
- set up ci (test/lint/format)
- add docs
- remove yaml config
- delete more unused code
- Interface improvements
- โ init-image at command line
- prompt expansion
- Image Generation Features
- โ add k-diffusion sampling methods
- why is k-diffusion so slow compared to plms? 2 it/s vs 8 it/s
- negative prompting
- some syntax to allow it in a text string
- upscaling
- โ realesrgan
- ldm
- https://github.com/lowfuel/progrock-stable
- stable super-res?
- todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
- todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
- โ
face enhancers
- โ gfpgan - https://github.com/TencentARC/GFPGAN
- โ codeformer - https://github.com/sczhou/CodeFormer
- โ image describe feature -
- outpainting
- โ
inpainting
- https://github.com/andreas128/RePaint
- img2img but keeps img stable
- https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
- https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
- https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
- CPU support
- โ img2img for plms
- img2img for kdiff functions
- images as actual prompts instead of just init images
- requires model fine-tuning since SD1.4 expects 77x768 text encoding input
- https://twitter.com/Buntworthy/status/1566744186153484288
- https://github.com/justinpinkney/stable-diffusion
- https://github.com/LambdaLabsML/lambda-diffusers
- https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
- cross-attention control:
- guided generation
- โ tiling
- output show-work videos
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
Noteable Stable Diffusion Implementations
- https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion
- https://github.com/lstein/stable-diffusion
- https://github.com/AUTOMATIC1111/stable-diffusion-webui
Further Reading
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for imaginAIry-1.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5d4f7de86a18ee824fa7cd37b8bc205f2aa04af530fcc5a765a965c098a50e4 |
|
MD5 | 9117ec6e917e22f86f4d7852379e9301 |
|
BLAKE2b-256 | b1ab770888a12e0506afd9be86ac6328a3a7f94c9e93ebae085fbcd759237c46 |