Skip to main content

A lightweight Python package to clean English text by removing HTML tags, URLs, emojis, digits, and punctuation.

Project description

txtcleanen

txtcleanen is a simple Python package for cleaning English text by removing HTML tags, URLs, emojis, numbers, punctuation, and extra whitespace for Natural Language Processing task.


Features

  • Remove HTML tags
  • Remove URLs
  • Remove emojis
  • Remove digits and punctuation
  • Normalize Unicode text
  • Compact multiple spaces into one

Installation

pip install txtcleanen

Example

import txtcleanen

text = "Hello <b>World!</b> Visit https://example.com now!"
clean_text = txtcleanen(text)
print(clean_text)
# Output: "Hello World Visit now"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txtcleanen-1.0.0.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

txtcleanen-1.0.0-py3-none-any.whl (3.2 kB view details)

Uploaded Python 3

File details

Details for the file txtcleanen-1.0.0.tar.gz.

File metadata

  • Download URL: txtcleanen-1.0.0.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for txtcleanen-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8a08a89db8320859549a703429a78fa34c5eb82f0f833bed022a4c980a3451d8
MD5 9431ebe7c5a8041bddee910991576cec
BLAKE2b-256 46591248952ffaa9001bd2764217b9c1e0e9d1b931e1202ee69e5ccf50fd1aed

See more details on using hashes here.

File details

Details for the file txtcleanen-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: txtcleanen-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for txtcleanen-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0cbcfc51b582d57082a45732d821d0bb9485557694cec51cd6752f78592c23c2
MD5 0d25819fcd265c00081201ff99805ae0
BLAKE2b-256 98b1197cfb3607e51967f8152a0e041950c2d26516f293831f58531ffe63f23b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page