UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.
Project description
UltraClean
UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.
Features
- Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
- Handle multi-dots, extra spaces, and hashtags.
- Batch processing for efficient text cleaning.
- Spam detection and filtering using pre-trained models.
Installation
You can install UltraClean using pip:
pip install ultraclean
Usage
Text Cleaning
from ultraclean.clean import cleanup
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)
Spam Detection
from ultraclean.predict import Spam
spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")
paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)
License
This project is licensed under the MIT License with attribution requirement.
Author
Ranit Bhowmick - bhowmickranitking@duck.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ultraclean-0.2.0.tar.gz.
File metadata
- Download URL: ultraclean-0.2.0.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3d2b549a7a47d06bdbca45e9650e60bfe2fb92848bc24af14d27d99dbd9f081
|
|
| MD5 |
b99e753bb6c660eb46e4487bf6386ad1
|
|
| BLAKE2b-256 |
c82db3bfbf7cb9fef81925cfced459a22009f406e385c6f718596faf87dd78c1
|
File details
Details for the file ultraclean-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ultraclean-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a243b11946f2378a52b4def6929db31e24c943647db111b55b74c187d8d09a2
|
|
| MD5 |
a9eabd4a2768054ec43e85787997c985
|
|
| BLAKE2b-256 |
6dff75f198c2b1b9454d462312422ab3e8c3e0d44d29b68acb8211a723c2dcaa
|