Skip to main content

A lightweight text preprocessing package for NLP tasks.

Project description

🧹 textcleaner

A lightweight and reusable text preprocessing package for NLP tasks. It cleans text by removing HTML tags and emojis, expanding contractions, correcting spelling, and performing lemmatization using spaCy.

✨ Features

•	✅ HTML tag and emoji removal
•	✅ Contraction expansion (e.g., “can’t” → “cannot”)
•	✅ Spelling correction with autocorrect
•	✅ Lemmatization using spaCy (en_core_web_sm)
•	✅ Filters out stopwords, punctuation, numbers
•	✅ Retains only nouns, verbs, adjectives, and adverbs

🚀 Installation

Install directly from GitHub:

pip install git+https://github.com/partha6369/textcleaner.git

🧠 Usage

from textcleaner import preprocess

text = "I can't believe it's already raining! 😞 <p>Click here</p>"
cleaned = preprocess(text)
print(cleaned)

📦 Dependencies

•	spacy
•	autocorrect
•	contractions

You can install them manually or via the included requirements.txt:

pip install -r requirements.txt

And download the required spaCy model:

python -m spacy download en_core_web_sm

📄 License

MIT License © Partha Majumdar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textcleaner_partha-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textcleaner_partha-0.1.0-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file textcleaner_partha-0.1.0.tar.gz.

File metadata

  • Download URL: textcleaner_partha-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for textcleaner_partha-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4d64f1a5474b528d7d0fd0f68fa51e09152c5a43ff215bb57d049883a310381a
MD5 764d6f80485364d7524148206831b7f9
BLAKE2b-256 207bd32adfe745b653551435a5efa8de397985682a0e86d3946c410dfa260d88

See more details on using hashes here.

File details

Details for the file textcleaner_partha-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for textcleaner_partha-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b811715350d0ffedb46b37e5e9225faf750099d34c8b0182f794ac08910a9ad2
MD5 6c73e1e435b892a9b4c7bb18a184519d
BLAKE2b-256 515a6aa3f0ae227b489c7314339a8142354739d558ef035410f35bf32e19b5c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page