Skip to main content

Text-to-image generation using Huggingface stable diffusion ControlNet conditioning and AWS Translate's prompt translation function package

Project description

GitHub license Downloads

AIsketcher

  • Stable Diffusion model : Lykon/DreamShaper[1]
  • Text-to-Image Generation with ControlNet Conditioning : used Canny edge detection [2][3]
  • prompt translator english to korean : Amazon Translate [4]

Text-to-image generation using Huggingface stable diffusion ControlNet conditioning and AWS Translate's prompt translation function

screenshot1 screenshot2

Project Description

This function takes two inputs: an image and a prompt text, utilizing the power of multi-modal models. In this project, I used Stable Diffusion, where prompts were written in English. However, for users who predominantly use other languages, it can be challenging to express the details of their input sentences. Therefore, we utilize user's language for the input prompt, and the corresponding text is machine-translated to English using Amazon Translate before being fed into the model.

Prerequisite: Load the ControlNetModel and StableDiffusionModel into the StableDiffusionControlNet Pipeline and prepare the PNDMScheduler.

controlnet_model = "lllyasviel/sd-controlnet-canny"
sd_model = "Lykon/DreamShaper"

controlnet = ControlNetModel.from_pretrained(
    controlnet_model,
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    sd_model,
    controlnet=controlnet,
    torch_dtype=torch.float16
)

pipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

Function Workflow

  1. Resize the input image to 800x800.
  2. Extract the edges, which are the key features of the input image, using the Canny function.
  3. If the input sentence contains the Amazon Translate dictionary (trans_info) variable, translate the sentence to English.
  4. Feed the translated prompt and the extracted edge image into the StableDiffusionControlNet Pipeline to generate a new image.
  5. Resize the output image back to the original size of the input image and display it.

This workflow allows for the generation of new images based on input images and prompts, with the option of translating the prompts to English for non-English input sentences.

Usage

pip install AIsketcher

case1. English Prompt

import AIsketcher
from PIL import Image
import numpy as np
file_name = 'hello.jpg'

input_text = 'Cute, (hungry), plump, sitting at a table by the beach, warm feeling, beautiful shining eyes, seascape'

num_steps = 50
guidance_scale = 17
seed =6764547109648557242 
low = 140
high = 160

image, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe)
Image.fromarray(np.concatenate([image.resize(out_image.size), out_image], axis=1))

case2. Korean Prompt without IAM AccessRole

import AIsketcher
from PIL import Image
import numpy as np
file_name = 'hello.jpg'
input_text = '귀여운, (배가고픈), 포동포동한, 해변가 식탁에 앉은, 따뜻한 느낌, 아름답고 빛나는 눈, 바다풍경'

trans_info = {
            'region_name' : 'us-east-1', #user region
            'aws_access_key_id' : '{{YOUR_ACCESS_KEY}}',
            'aws_secret_access_key' : '{{YOUR_SECRET_KEY}}',
            'SourceLanguageCode' : 'ko',
            'TargetLanguageCode' : 'en',
            'iam_access' : False
        }

num_steps = 50
guidance_scale = 17
seed =6764547109648557242 
low = 140
high = 160

image, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe, trans_info)

case3. Korean Prompt with IAM AccessRole between SageMaker and Translate

import AIsketcher
from PIL import Image
import numpy as np
file_name = 'hello.jpg'
input_text = '귀여운, (배가고픈), 포동포동한, 해변가 식탁에 앉은, 따뜻한 느낌, 아름답고 빛나는 눈, 바다풍경'

trans_info = {
            'region_name' : 'us-east-1', #user region
            'SourceLanguageCode' : 'ko',
            'TargetLanguageCode' : 'en',
            'iam_access' : True
        }

num_steps = 50
guidance_scale = 17
seed =6764547109648557242 
low = 140
high = 160

image, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe, trans_info)

Default Parameters Used

default_prompt

(8k, best quality, masterpiece:1.2), (realistic, photo-realistic:1.37), ultra-detailed,

negative_prompt

NSFW, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Variables Description
num_steps Number of steps to run the diffusion process for
guidance_scale Creativity value adjustment, a parameter that controls how much the image generation process follows the text prompt
seed a number used to initialize the generation in the stable diffusion model
low Canny Edge Detection lowpass filter threshold
high Canny Edge Detection highpass filter threshold
pipe PNDMScheduler
trans_info Amazon Translate parameters,

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

AIsketcher-0.0.0.3-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file AIsketcher-0.0.0.3-py3-none-any.whl.

File metadata

  • Download URL: AIsketcher-0.0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for AIsketcher-0.0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 462d3a65ddae805159053b98fc016183b907a2bbb26cd0b42f5445c2a176e63d
MD5 9bea93e044dacddefdb3c05b5c37661e
BLAKE2b-256 0c9407c2a5b86ebaa0f54054efabf8ba3f387b69764b97cddcc0088b74585dcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page