Skip to main content

Integrated Python package for converting Image to speech

Project description

License: MIT

✅DESCRIPTION

An ITTTS (Image-to-text-to-speech) python package for integrated conversion of textual images and PDF document to human speech. This library aims at easing the internal image preprocessing and conversion of extracted text to human speech over multiple languages.

✅QUICK START

Dependencies

This pipeline requires the dependencies which can be installed by running:

pip install -r requirements.txt



✅FUNCTIONS

  • image_to_sound(path_to_image, lang, pre_process="NO")

This main function is leveraged to convert the input textual image to human speech in the language present in the text. The function returns the intermediate text extracted and the speech generated. The speech can then be saved to a .mp3 file.

Parameters:

path_to_image : Defines the path to the "image" as .png or .jpg files. PDF files are also supported along with public image URLs from the internet

lang : Defines the language used in the text. The list of languages supported currently include:

["ENGLISH" , "HINDI", "TELUGU", "KANNADA"]

pre_process(optional) : If the user would like to use our internal pre-processing pipeline for better results.

For example:

image_to_sound("images/image1.png","ENGLISH",pre_process="YES")

OR

image_to_sound("files/text.pdf","HINDI",pre_process="YES")

OR

image_to_sound("https://www.techsmith.com/blog/wp-content/uploads/2020/11/TechSmith-Blog-ExtractText.png","ENGLISH",pre_process="YES")


  • preprocess(path_to_image):

This function defines the internal pre-processing pipeline used in the package. The user could use this function to retrieve the intermediate preprocessed image before conversion to text and speech.

For example:

preprocess("images/image1.jpg"):


MODELS USED FOR OCR

After the input image is ensured to be of high quality, we use an efficient OCR tool called ”EasyOCR” for conversion of textual image to human readable text. We preferred EasyOCR over other tools like tesseract because EasyOCR provides us with pre-trained models for various languages. They also perform well on noisy or low-quality images. It is designed to be fast and can process multiple images in parallel making it suitable for use.

TEXT TO SPEECH CONVERSION

We leverage ”gTTS (Google Text-to-Speech)” to accomplish this task. ”gTTS” is a popular TTS (Text-to-Speech) engine that uses Google’s machine learning and neural network algorithms to synthesize natural-sounding speech from text input. We chose gTTS engine because GTTS allows for customization of voice, pitch, speaking rate, and volume to create a more personalized listening experience

INSTALLATION

Install using pip

For the latest stable release:

pip install img2speech

USAGE

import img2speech
text,speech = img2speech.image_to_sound("images/image1.png", "ENGLISH", "YES")
print(text) # prints text extracted by OCR model
speech.save("output.mp3") # saves the speech output as mp3 file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

img2speech-1.0.13.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

img2speech-1.0.13-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file img2speech-1.0.13.tar.gz.

File metadata

  • Download URL: img2speech-1.0.13.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for img2speech-1.0.13.tar.gz
Algorithm Hash digest
SHA256 1c0133254c577cb058c72d43adff1c57a2d93389478f11036b37467aabfe6a49
MD5 698dbb589179fe82ba70651f683863ab
BLAKE2b-256 732e95651c3ef9122d4c5c2c28fd71ff3d0a66a13886c6ff215d8560ec5f0b9f

See more details on using hashes here.

File details

Details for the file img2speech-1.0.13-py3-none-any.whl.

File metadata

  • Download URL: img2speech-1.0.13-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for img2speech-1.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 5081be3e77dd52903292b89e0233ac08c786e9fb174063b12e9e21580f6f303a
MD5 bf0368473a3bd16dfcace6b6b8399235
BLAKE2b-256 395b339ee59faa1d95de8ca66193043fcdcec59cbe08f5e5443975e03a86d669

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page