Integrated Python package for converting Image to speech
Project description
✅DESCRIPTION
An ITTTS (Image-to-text-to-speech) python package for integrated conversion of textual images and PDF document to human speech.
This library aims at easing the internal image preprocessing and conversion of extracted text to human speech over multiple languages.
✅QUICK START
Dependencies
This pipeline requires the dependencies which can be installed by running:
pip install -r requirements.txt
✅FUNCTIONS
- image_to_sound(path_to_image, lang, pre_process="NO")
This main function is leveraged to convert the input textual image to human speech in the language present in the text. The function returns the intermediate text extracted and the speech generated. The speech can then be saved to a .mp3 file.
Parameters:
path_to_image : Defines the path to the "image" as .png or .jpg files. PDF files are also supported along with public image URLs from the internet
lang : Defines the language used in the text. The list of languages supported currently include:
["ENGLISH" , "HINDI", "TELUGU", "KANNADA"]
pre_process(optional) : If the user would like to use our internal pre-processing pipeline for better results.
For example:
image_to_sound("images/image1.png","ENGLISH",pre_process="YES")
OR
image_to_sound("files/text.pdf","HINDI",pre_process="YES")
OR
image_to_sound("https://www.techsmith.com/blog/wp-content/uploads/2020/11/TechSmith-Blog-ExtractText.png","ENGLISH",pre_process="YES")
- preprocess(path_to_image):
This function defines the internal pre-processing pipeline used in the package. The user could use this function to retrieve the intermediate preprocessed image before conversion to text and speech.
For example:
preprocess("images/image1.jpg"):
MODELS USED FOR OCR
After the input image is ensured to be of high quality, we use an efficient OCR
tool called ”EasyOCR” for conversion of textual image to human readable text.
We preferred EasyOCR over other tools like tesseract because EasyOCR provides
us with pre-trained models for various languages. They also perform well on noisy
or low-quality images. It is designed to be fast and can process multiple images
in parallel making it suitable for use.
TEXT TO SPEECH CONVERSION
We leverage ”gTTS (Google Text-to-Speech)”
to accomplish this task. ”gTTS” is a popular TTS (Text-to-Speech) engine
that uses Google’s machine learning and neural network algorithms to synthesize
natural-sounding speech from text input. We chose gTTS engine because GTTS
allows for customization of voice, pitch, speaking rate, and volume to create a more
personalized listening experience
INSTALLATION
Install using pip
For the latest stable release:
pip install img2speech
USAGE
import img2speech
text,speech = img2speech.image_to_sound("images/image1.png", "ENGLISH", "YES")
print(text) # prints text extracted by OCR model
speech.save("output.mp3") # saves the speech output as mp3 file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file img2speech-1.0.13.tar.gz
.
File metadata
- Download URL: img2speech-1.0.13.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c0133254c577cb058c72d43adff1c57a2d93389478f11036b37467aabfe6a49 |
|
MD5 | 698dbb589179fe82ba70651f683863ab |
|
BLAKE2b-256 | 732e95651c3ef9122d4c5c2c28fd71ff3d0a66a13886c6ff215d8560ec5f0b9f |
File details
Details for the file img2speech-1.0.13-py3-none-any.whl
.
File metadata
- Download URL: img2speech-1.0.13-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5081be3e77dd52903292b89e0233ac08c786e9fb174063b12e9e21580f6f303a |
|
MD5 | bf0368473a3bd16dfcace6b6b8399235 |
|
BLAKE2b-256 | 395b339ee59faa1d95de8ca66193043fcdcec59cbe08f5e5443975e03a86d669 |