Skip to main content

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.

Project description

UltraClean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.

Features

  • Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
  • Handle multi-dots, extra spaces, and hashtags.
  • Batch processing for efficient text cleaning.
  • Spam detection and filtering using pre-trained models.

Installation

You can install UltraClean using pip:

pip install ultraclean

Usage

Text Cleaning

from ultraclean.clean import cleanup

text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)

Spam Detection

from ultraclean.predict import Spam

spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")

paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)

License

This project is licensed under the MIT License with attribution requirement.

Author

Ranit Bhowmick - bhowmickranitking@duck.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultraclean-0.1.0a0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultraclean-0.1.0a0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file ultraclean-0.1.0a0.tar.gz.

File metadata

  • Download URL: ultraclean-0.1.0a0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 468f7a10725e2d7f7341b89a13aaf60bb4fa23b67cd677ed3587cccf2c5083bd
MD5 a2f7be4f6e5f54b01975966aa7e95bb9
BLAKE2b-256 428eaaa13e093bc0e82d69ad89c8bf8d80171bdcf81550437b8f162bccc292f8

See more details on using hashes here.

File details

Details for the file ultraclean-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: ultraclean-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 67b0d548d36bb4b7be1cf95f6013b20f065292c33a50efc190d38e2ebec512de
MD5 bb41cfa08138a45a5d7d4344733756d8
BLAKE2b-256 44a218a12bfdae630be89243bcc6dbffc5dc7479f6cb48c7dd9c67234112438f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page