Skip to main content

Text Data Cleaning

Project description

  • This package cleans the text data such as removal of HTML tags, URLs, NLTK Stopwords, numbers, punctuations.

Features

  • Remove URLs
  • Remove HTML Tags
  • Remove NLTK Stopwords
  • Remove Numbers
  • Remove Punctuations
  • Remove Additional Spaces
  • Changes to Lower Case

Installation

  • In the code notebook like IPYNB use the below command\

    !pip install py-text-data-clean
    
  • If installing from Anaconda Prompt of CMD Terminal, use the below command\

    pip install py-text-data-clean
    
  • Note:

 Check if the package version is upgraded. If the version is not upgraded, please upgrade it.

 # To check the version, run the below code
 !pip show py-text-data-clean

 # To upgrade the package, run the below code
 !pip install py-text-data-clean -U

Usage

Input:

 - List of text data - Example: ["Is the   time 12 Noon now, isn't it?", "It is a python link: https://pypi.org/"]

Output:

 - ['time noon', 'python link']

Code to clean text with a single function:

# Import the library
from pytextdataclean import textclean as tc
input_text_list = ["Is the   time 12 Noon now, isn't it?", "It is a python link: https://pypi.org/"]
result = tc.text_clean(data=input_text_list)
print(result)

Code to use each available features:

# Pass the list of text

# Example list:
input_text_list = ["Is the   time 12 Noon now, isn't it?", "It is a python link: https://pypi.org/"]

# Import the library
from pytextdataclean import textclean as tc

# To remove html tags
tc.remove_html_tags(data=input_text_list)

# To remove NLTK stop words
tc.remove_nltk_stopwords(data=input_text_list)

# To remove URLs
tc.remove_url(data=input_text_list)

# To remove punctuations
tc.remove_punctuation(data=input_text_list)

# To remove numerical digits
tc.remove_digits(data=input_text_list)

# To remove foreign languages
tc.remove_foreign_languages(data=input_text_list)

# To remove spaces
tc.remove_spaces(data=input_text_list)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_text_data_clean-0.0.5.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_text_data_clean-0.0.5-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file py_text_data_clean-0.0.5.tar.gz.

File metadata

  • Download URL: py_text_data_clean-0.0.5.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for py_text_data_clean-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ffc2a3d120c207159f7a244e543bf396e69a2012040f101b67068b1737aecb8e
MD5 81d4cf44accc36adaf933c117030c87f
BLAKE2b-256 d28414fe74d624b7b41e2ce3c1a057a55123043ca890dec5e6970ff8f96aed67

See more details on using hashes here.

File details

Details for the file py_text_data_clean-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: py_text_data_clean-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for py_text_data_clean-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 15cdc62e2df019d204524bedbe12d26a041a0e4bc72296aa26be613f2939c58a
MD5 5ba8b5b4926003dd5c0a5f67ec929c3b
BLAKE2b-256 5cf87653547ef6d8c1aa36e821f51646429ab62887ad26245a30b4073f8a067b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page