Skip to main content

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python, with added support for image augmentation.

Project description

DupliPy 0.2.5

Python Version Code Size Downloads License Compliance PyPI Version

An open source Python library for text formatting, augmentation, and similarity calculation tasks in NLP, the package now also includes additional methods for image augmentation.

Changes in DupliPy 0.2.5

  • Added new text augmentation functions: swap_random_words and random_word_deletion.
  • Added new text similarity metrics: sorensen_dice_coefficient and cosine_similarity_score.
  • Added new image similarity metrics: mean_squared_error and psnr.
  • Added new text analysis function: named_entity_recognition.
  • Improved progress bars for augmentation functions.

Changes in DupliPy 0.2.4

  • Created new functions in duplipy.replication for image augmentation: random_flip, random_color_jitter, and noise_overlay.
  • Created a new function (post_format_text) for post-formatting after DupliPy processing or augmentation tasks that cleans up extra whitespace and normalizes punctuation spacing.

Changes to DupliPy 0.2.3

Duplipy now utilizes another one of our Python packages, called ValX, which provides quick methods we can use to clean and format our text data before training in preprocessing steps.

Installation

You can install DupliPy using pip:

pip install duplipy

Supported Python Versions

DupliPy supports the following Python versions:

  • Python 3.6
  • Python 3.7
  • Python 3.8
  • Python 3.9
  • Python 3.10
  • Python 3.11
  • Python 3.12 or later

Please ensure that you have one of these Python versions installed before using DupliPy. DupliPy may not work as expected on lower versions of Python than the supported.

Features

  • Text Formatting: Remove special characters, standardize text formatting.
  • Text Replication: Generate replicated instances of text for data augmentation.
  • Sentiment Analysis: Find impressions within sentences.
  • Similarity Calculation: Calculate text and image similarity using various metrics.
  • BLEU Score Calculation: Calculate how well your text-based NLP model performs.
  • Named Entity Recognition: Identify and categorize key information in text.
  • Image Augmentation Tasks.
  • Profanity removal, hate speech removal, offensive speech removal, and sensitive information removal.

For full reference documentation view DupliPy's official documentation.

Usage

Text Formatting

from duplipy.formatting import remove_special_characters, standardize_text

text = "Hello! This is some text."

# Remove special characters
formatted_text = remove_special_characters(text)
print(formatted_text)  # Output: Hello This is some text

# Standardize text formatting
standardized_text = standardize_text(text)
print(standardized_text)  # Output: hello! this is some text

Text Replication

from duplipy.replication import replace_word_with_synonym, augment_text_with_synonyms, swap_random_words, random_word_deletion

text = "Hello! This is some text."

# Replace words with synonyms
augmented_text = augment_text_with_synonyms(text, augmentation_factor=3, probability=0.5)
print(augmented_text)

# Swap random words
swapped_text = swap_random_words(text)
print(swapped_text)

# Delete random words
deleted_text = random_word_deletion(text, num_deletions=1)
print(deleted_text)

Sentiment Analysis

from duplipy.text_analysis import analyze_sentiment

text = "I love this product! It's amazing!"

# Analyze sentiment
sentiment = analyze_sentiment(text)
print(sentiment)  # Output: Positive

Similarity Calculation

from duplipy.similarity import edit_distance_score, sorensen_dice_coefficient, cosine_similarity_score

text1 = "Hello! How are you?"
text2 = "Hi! How are you doing?"

# Calculate edit distance
edit_distance = edit_distance_score(text1, text2)
print(edit_distance)  # Output: 4

# Calculate Sorensen-Dice coefficient
dice_coefficient = sorensen_dice_coefficient(text1, text2)
print(dice_coefficient)

# Calculate cosine similarity
cosine_sim = cosine_similarity_score(text1, text2)
print(cosine_sim)

BLEU Score Calculation

from duplipy.similarity import bleu_score

text1 = "Hello! How are you?"
text2 = "Hi! How are you doing?"

# Calculate cosine similarity
bleu_value = bleu_score(text1, text2)
print(bleu_value)  # Output: 0.434

Image Augmentation

from PIL import Image
from duplipy.replication import flip_horizontal, flip_vertical, rotate, random_rotation, resize, crop, random_crop

# Load an image for testing
image_path = "path/to/image.jpg"
image = Image.open(image_path)

# Flip the image horizontally
flipped_horizontal_image = flip_horizontal(image)

# Flip the image vertically
flipped_vertical_image = flip_vertical(image)

# Rotate the image by a specific angle (e.g., 45 degrees)
rotated_image = rotate(image, 45)

# Apply a random rotation to the image within a specified range of angles (e.g., -30 to 30 degrees)
randomly_rotated_image = random_rotation(image, max_angle=30)

# Resize the image to a specific target size (e.g., 224x224 pixels)
resized_image = resize(image, target_size=(224, 224))

# Crop a random region from the image (e.g., 150x150 pixels)
randomly_cropped_image = random_crop(image, crop_size=(150, 150))

# Save the augmented images (optional, if you want to view the results)
flipped_horizontal_image.save("path/to/flipped_horizontal.jpg")
flipped_vertical_image.save("path/to/flipped_vertical.jpg")
rotated_image.save("path/to/rotated.jpg")
randomly_rotated_image.save("path/to/randomly_rotated.jpg")
resized_image.save("path/to/resized.jpg")
randomly_cropped_image.save("path/to/randomly_cropped.jpg")

Image Similarity

from PIL import Image
from duplipy.similarity import mean_squared_error, psnr

# Load two images for testing
image1 = Image.open("path/to/image1.jpg")
image2 = Image.open("path/to/image2.jpg")

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(image1, image2)
print(f"Mean Squared Error: {mse}")

# Calculate Peak Signal-to-Noise Ratio (PSNR)
psnr_value = psnr(image1, image2)
print(f"PSNR: {psnr_value}")

Named Entity Recognition

from duplipy.text_analysis import named_entity_recognition

text = "Apple is looking at buying U.K. startup for $1 billion"

# Perform NER
entities = named_entity_recognition(text)
print(entities)

Hate speech and Offensive speech removal using AI

from duplipy.formatting import remove_hate_speech_from_text

text = "I hate all of you bad word! Can't you just bad word leave me alone! Hi, I'm Katy."

print(remove_hate_speech_from_text(text))

### Output
# "Hi, I'm Katy."

Contributing

Contributions are welcome! If you encounter any issues, have suggestions, or want to contribute to DupliPy, please open an issue or submit a pull request on GitHub.

License

DupliPy is released under the terms of the MIT License (Modified). Please see the LICENSE file for the full text.

Modified License Clause

The modified license clause grants users the permission to make derivative works based on the DupliPy software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.

By enforcing this distinction, it aims to prevent direct publishing of the source code without changes while allowing users to create derivative works that incorporate the code but are not exactly the same.

Please read the full license terms in the LICENSE file for complete details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duplipy-0.2.5.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duplipy-0.2.5-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file duplipy-0.2.5.tar.gz.

File metadata

  • Download URL: duplipy-0.2.5.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for duplipy-0.2.5.tar.gz
Algorithm Hash digest
SHA256 1cf682405e84fd028d86a5ff1c479302cd41b6a77dc2987c656d9ac5883ca6ba
MD5 edba06b9d68b58b6d01f64c1ab7790a0
BLAKE2b-256 ddc50f3316eb184936bd298b218969b443ed3a9184af7a7cbe38dac64e9993d2

See more details on using hashes here.

File details

Details for the file duplipy-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: duplipy-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for duplipy-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a5aeb073b4a6e19e419ae612ae635120ccf74a48c1775e5ded877d07b44a39a1
MD5 a7f4c45c7ee147467cafbb6af14a42fd
BLAKE2b-256 b4b5285ab9bdbb1acef5b11f1b65ce6a899598e8d11cb84406cf3015b6709268

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page