The Python module developed for the Orange Python Tool Plugin serves a dual purpose, providing functionalities for both image processing using OpenCV and the generation of synthetic datasets through the Faker library.
Project description
OrangeLab
The Python module developed for the Orange Python Tool Plugin serves a dual purpose, providing functionalities for both image processing using OpenCV and the generation of synthetic datasets through the Faker library.
Fake Dataset Generator Documentation
Introduction
The generate_fake_dataset
function is designed to generate a fake dataset based on the provided configuration using the Faker library. This can be useful for testing, development, or generating sample data.
Function Signature
generate_fake_dataset(num_rows, path="fake_dataset.csv", none_values=True, table_head=None)
num_rows
: Number of rows to generate in the dataset.path
: The path where the generated dataset will be saved. Default is "fake_dataset.csv".none_values
: Whether to include None values in the generated dataset. Default isTrue
.table_head
: A dictionary specifying the columns and their configurations.
Column Configuration
The table_head
dictionary defines the columns of the dataset, where each key is the column name, and the value is a dictionary specifying the column configuration. Supported configurations include:
type
: Faker provider type (e.g., "name", "email", "passport_gender").dtype
: Data type (e.g., "int").range
: Range of values for integer columns (e.g., "1-100").len
: Length of integer columns.
Example Usage
import pandas as pd
from faker import Faker
# Define the column configurations
table_head = {
"name": {"type": "name"},
"email": {"type": "email"},
"gender": {"type": "passport_gender"},
"age": {"dtype": "int", "range": "10-100"},
"score": {"dtype": "int", "range": "10-100"},
"time_s": {"dtype": "int", "range": "10-100"},
}
# Generate a fake dataset with 500 rows and save it to a CSV file
generate_fake_dataset(500, "sample.csv", True, table_head)
Supported Faker Providers
The Faker
library provides a wide range of providers for generating fake data. Some of the supported types include:
Types = [ "aba", "add_provider", "address", "administrative_unit", "am_pm", "android_platform_token", "ascii_company_email", "ascii_email", "ascii_free_email", "ascii_safe_email", "bank_country", "basic_phone_number", "bban", "binary", "boolean", "bothify", "bs", "building_number", "cache_pattern", "catch_phrase", "century", "chrome", "city", "city_prefix", "city_suffix", "color", "color_hsl", "color_hsv", "color_name", "color_rgb", "color_rgb_float", "company", "company_email", "company_suffix", "coordinate", "country", "country_calling_code", "country_code", "credit_card_expire", "credit_card_full", "credit_card_number", "credit_card_provider", "credit_card_security_code", "cryptocurrency", "cryptocurrency_code", "cryptocurrency_name", "csv", "currency", "currency_code", "currency_name", "currency_symbol", "current_country", "current_country_code", "date", "date_between", "date_between_dates", "date_object", "date_of_birth", "date_this_century", "date_this_decade", "date_this_month", "date_this_year", "date_time", "date_time_ad", "date_time_between", "date_time_between_dates", "date_time_this_century", "date_time_this_decade", "date_time_this_month", "date_time_this_year", "day_of_month", "day_of_week", "del_arguments", "dga", "domain_name", "domain_word", "dsv", "ean", "ean13", "ean8", "ein", "email", "emoji", "enum", "factories", "file_extension", "file_name", "file_path", "firefox", "first_name", "first_name_female", "first_name_male", "first_name_nonbinary", "fixed_width", "format", "free_email", "free_email_domain", "future_date", "future_datetime", "generator_attrs", "get_arguments", "get_formatter", "get_providers", "hex_color", "hexify", "hostname", "http_method", "iana_id", "iban", "image", "image_url", "internet_explorer", "invalid_ssn", "ios_platform_token", "ipv4", "ipv4_network_class", "ipv4_private", "ipv4_public", "ipv6", "isbn10", "isbn13", "iso8601", "items", "itin", "job", "json", "json_bytes", "language_code", "language_name", "last_name", "last_name_female", "last_name_male", "last_name_nonbinary", "latitude", "latlng", "lexify", "license_plate", "linux_platform_token", "linux_processor", "local_latlng", "locale", "locales", "localized_ean", "localized_ean13", "localized_ean8", "location_on_land", "longitude", "mac_address", "mac_platform_token", "mac_processor", "md5", "military_apo", "military_dpo", "military_ship", "military_state", "mime_type", "month", "month_name", "msisdn", "name", "name_female", "name_male", "name_nonbinary", "nic_handle", "nic_handles", "null_boolean", "numerify", "opera", "optional", "paragraph", "paragraphs", "parse", "passport_dates", "passport_dob", "passport_full", "passport_gender", "passport_number", "passport_owner", "password", "past_date", "past_datetime", "phone_number", "port_number", "postalcode", "postalcode_in_state", "postalcode_plus4", "postcode", "postcode_in_state", "prefix", "prefix_female", "prefix_male", "prefix_nonbinary", "pricetag", "profile", "provider", "providers", "psv", "pybool", "pydecimal", "pydict", "pyfloat", "pyint", "pyiterable", "pylist", "pyobject", "pyset", "pystr", "pystr_format", "pystruct", "pytimezone", "pytuple", "random", "random_choices", "random_digit", "random_digit_above_two", "random_digit_not_null", "random_digit_not_null_or_empty", "random_digit_or_empty", "random_element", "random_elements", "random_int", "random_letter", "random_letters", "random_lowercase_letter", "random_number", "random_sample", "random_uppercase_letter", "randomize_nb_elements", "rgb_color", "rgb_css_color", "ripe_id", "safari", "safe_color_name", "safe_domain_name", "safe_email", "safe_hex_color", "sbn9", "secondary_address", "seed", "seed_instance", "seed_locale", "sentence", "sentences", "set_arguments", "set_formatter", "sha1", "sha256", "simple_profile", "slug", "ssn", "state", "state_abbr", "street_address", "street_name", "street_suffix", "suffix", "suffix_female", "suffix_male", "suffix_nonbinary", "swift", "swift11", "swift8", "tar", "text", "texts", "time", "time_delta", "time_object", "time_series", "timezone", "tld", "tsv", "unique", "unix_device", "unix_partition", "unix_time", "upc_a", "upc_e", "uri", "uri_extension", "uri_page", "uri_path", "url", "user_agent", "user_name", "uuid4", "vin", "weights", "windows_platform_token", "word", "words", "xml", "year", "zip", "zipcode", "zipcode_in_state", "zipcode_plus4"]
Also you can use supported_types()
function to find the supported types.
For a complete list of supported providers, refer to the Faker
documentation or use the Faker.providers
module to explore available options.
OCR Installation and Tesseract Path Finder
Introduction
The provided Python script includes functions to install an OCR (Optical Character Recognition) tool from an installer executable and to find the path to the Tesseract OCR executable on the system.
Functions
1. install_ocr()
This function installs an OCR tool using an installer executable. The installer file should be specified as ocr.exe
. The function retrieves the current script's path, combines it with the installer filename to create the full installer path, and then attempts to run the installer using subprocess.run()
.
Example Usage:
# Example: Install OCR
install_ocr()
2. find_tesseract()
This function attempts to find the path to the Tesseract OCR executable. It first checks if Tesseract is installed in a specific path (C:\Program Files\Tesseract-OCR\tesseract.exe
). If not found, it then tries to locate Tesseract in the system's PATH using shutil.which("tesseract")
.
Example Usage:
# Example: Find Tesseract Path
tesseract_path = find_tesseract()
if tesseract_path:
print(f"Tesseract found at: {tesseract_path}")
else:
print("Tesseract not found.")
Important Note
- Ensure that the installer file (
ocr.exe
) is in the same directory as the script or provide the correct path to the installer in theinstall_ocr()
function. - For
find_tesseract()
, if Tesseract is not found in the specified path or in the system's PATH, it will returnNone
.
Recommendations
- Before using
install_ocr()
, verify that the installer file is compatible with your system. - Make sure to have the necessary permissions to install software on the system.
- If you are installing from a source other than PyPI, ensure that it is trusted or has been verified by an individual who knows what they
Image Processing and OCR Functions Documentation
The provided Python script includes various functions for image processing using OpenCV and image-to-text extraction using Tesseract OCR. Below is the documentation for each function along with example usages.
Image Processing Functions
1. read_image(image_path)
Read an image from the specified path using OpenCV.
Example Usage:
image = read_image('path/to/image.jpg')
2. display_image(img, title='Image')
Display the image using Matplotlib.
Example Usage:
display_image(image, title='Original Image')
3. convert_to_grayscale(img)
Convert the image to grayscale.
Example Usage:
gray_image = convert_to_grayscale(image)
4. apply_blur(img, kernel_size=(5, 5))
Apply Gaussian blur to the image.
Example Usage:
blurred_image = apply_blur(image, kernel_size=(9, 9))
5. edge_detection(img, low_threshold=50, high_threshold=150)
Apply Canny edge detection to the image.
Example Usage:
edges = edge_detection(gray_image, low_threshold=30, high_threshold=100)
6. resize_image(img, new_size=(300, 300))
Resize the image to the specified dimensions.
Example Usage:
resized_image = resize_image(image, new_size=(500, 500))
7. adjust_brightness_contrast(img, alpha=1.5, beta=30)
Adjust the brightness and contrast of the image.
Example Usage:
adjusted_image = adjust_brightness_contrast(image, alpha=2.0, beta=50)
8. apply_threshold(img, threshold_value=128, max_value=255, threshold_type=cv2.THRESH_BINARY)
Apply a binary threshold to the image.
Example Usage:
thresholded_image = apply_threshold(gray_image, threshold_value=100, max_value=255, threshold_type=cv2.THRESH_BINARY)
9. apply_dilation(img, kernel_size=(5, 5))
Apply dilation to the image.
Example Usage:
dilated_image = apply_dilation(thresholded_image, kernel_size=(3, 3))
10. change_image_color(img, channel, value)
Change the intensity of a specific color channel.
Example Usage:
modified_image = change_image_color(image, channel=2, value=50)
11. find_and_draw_contours(img)
Find contours in the image and draw them.
Example Usage:
contour_image = find_and_draw_contours(thresholded_image)
12. most_used_color(img)
Find the most used color in the image.
Example Usage:
most_used_color_value = most_used_color(image)
13. image_details(img)
Get details about the image, including dimensions and pixel values.
Example Usage:
details = image_details(image)
14. image_to_ascii(img, scale_factor=0.1)
Convert the image to ASCII art.
Example Usage:
ascii_art = image_to_ascii(image, scale_factor=0.05)
print(ascii_art)
OCR Function
15. extract_text_from_image(image_path, tesseract_path)
Extract text from an image using Tesseract OCR.
Example Usage:
text = extract_text_from_image('path/to/image.png', 'path/to/tesseract.exe')
print(text)
Note: Ensure that Tesseract is installed and provide the correct path to the Tesseract executable in the extract_text_from_image
function you can use install_ocr() for ocr installation.
The following are additional image processing functions that enhance the capabilities of the provided script. These functions can be used in combination with the existing ones to perform a wider range of image manipulations.
16. rotate_image(img, angle)
Rotate the image by a specified angle.
Function Signature:
rotate_image(img, angle)
Parameters:
img
: The input image.angle
: The angle by which to rotate the image.
Example Usage:
rotated_image = rotate_image(image, angle=45)
17. apply_mask(img, mask)
Apply a binary mask to the image.
Function Signature:
apply_mask(img, mask)
Parameters:
img
: The input image.mask
: Binary mask with the same dimensions as the image.
Example Usage:
# Assuming 'mask' is a binary mask with the same dimensions as the image
masked_image = apply_mask(image, mask)
18. invert_image(img)
Invert the colors of the image.
Function Signature:
invert_image(img)
Parameters:
img
: The input image.
Example Usage:
inverted_image = invert_image(image)
19. add_noise(img, intensity=50)
Add random noise to the image.
Function Signature:
add_noise(img, intensity=50)
Parameters:
img
: The input image.intensity
: Intensity of the noise.
Example Usage:
noisy_image = add_noise(image, intensity=30)
20. morphological_operations(img, operation='dilate', kernel_size=(5, 5))
Apply morphological operations (dilation or erosion) to the image.
Function Signature:
morphological_operations(img, operation='dilate', kernel_size=(5, 5))
Parameters:
img
: The input image.operation
: Operation to perform ('dilate' or 'erode').kernel_size
: Size of the kernel for the operation.
Example Usage:
# Dilate the image
dilated_image = morphological_operations(image, operation='dilate', kernel_size=(3, 3))
# Erode the image
eroded_image = morphological_operations(image, operation='erode', kernel_size=(3, 3))
Change Log
0.0.0.1 (17/11/2023)
-First Release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file OrangeLab-0.0.1.tar.gz
.
File metadata
- Download URL: OrangeLab-0.0.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e7a3a7a8692bd425ef6ae513335782f424115fb9dbf1e706c76b896cb20c4c7 |
|
MD5 | fddf413f679154702a1b139f2ec3770a |
|
BLAKE2b-256 | 9450046bc2b66d0f5ea763bfdc2da75e1042aedb67d905a3a74d618d387c0627 |
File details
Details for the file OrangeLab-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: OrangeLab-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4c17f30275275cb3b305605c0084d77af48b8ad4dc9553a83491c1c65ba8180 |
|
MD5 | 30e0deaf5f0accbe9654ad64d9fe89f7 |
|
BLAKE2b-256 | 36e71c81266578bb5593034cbdba697f8600631337a388f701db139c49e10148 |