Skip to main content

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.

Project description

UltraClean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.

Features

  • Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
  • Handle multi-dots, extra spaces, and hashtags.
  • Batch processing for efficient text cleaning.
  • Spam detection and filtering using pre-trained models.

Installation

You can install UltraClean using pip:

pip install ultraclean

Usage

Text Cleaning

from ultraclean.clean import cleanup

text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)

Spam Detection

from ultraclean.predict import Spam

spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")

paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)

License

This project is licensed under the MIT License with attribution requirement.

Author

Ranit Bhowmick - bhowmickranitking@duck.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultraclean-0.2.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultraclean-0.2.2-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file ultraclean-0.2.2.tar.gz.

File metadata

  • Download URL: ultraclean-0.2.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.2.2.tar.gz
Algorithm Hash digest
SHA256 a1835b943569aa8f730676d427ce9a62817fafd2656aed77c057d24a023b4665
MD5 acd9eb09af2a7ada4031b7d21e122634
BLAKE2b-256 1df669331e1224049788172374174686638be715517dead4e60167db9ab8833f

See more details on using hashes here.

File details

Details for the file ultraclean-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: ultraclean-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 10a407318e0042b6608477d7fed7b71693b88e89502969533e8506866a7d5826
MD5 86d1e6642771e1e4165c0c2d1e26b7f5
BLAKE2b-256 2f6a4685265e621ff2b351c22ffcaa8ba1ae9159d452a9f1ca461707cb37ad20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page