Skip to main content

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.

Project description

UltraClean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.

Features

  • Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
  • Handle multi-dots, extra spaces, and hashtags.
  • Batch processing for efficient text cleaning.
  • Spam detection and filtering using pre-trained models.

Installation

You can install UltraClean using pip:

pip install ultraclean

Usage

Text Cleaning

from ultraclean.clean import cleanup

text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)

Spam Detection

from ultraclean.predict import Spam

spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")

paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)

License

This project is licensed under the MIT License with attribution requirement.

Author

Ranit Bhowmick - bhowmickranitking@duck.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultraclean-0.2.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultraclean-0.2.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file ultraclean-0.2.1.tar.gz.

File metadata

  • Download URL: ultraclean-0.2.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e314c653f29cff05b2e6ea949405e542ef9b42164214e12cc5b8589ffba5f037
MD5 7fca10866a0dc38f2a8c8a3cfd17a2b2
BLAKE2b-256 ff62767e4cd9b87d1bc788a9f6b6b4f5df96f96fc1e4aaf89e6d1adff278dd9c

See more details on using hashes here.

File details

Details for the file ultraclean-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ultraclean-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for ultraclean-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2ea92c5b7294dd22a96596e97e609f4fe5aa8600313deda20fc65862d2d43860
MD5 51d5ad2e4a9542c9f470f03f59498218
BLAKE2b-256 93c55e7d4cfb1766626a5981a7ace845c8fd277d42adc04208e08e2c5717fce8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page