Skip to main content

pre-processing package for text strings

Project description

Spotlight Data Logo


Documentation Status


Text pre-processing package to aid in NLP package development for Python3. With this package you can order text cleaning functions in the order you prefer rather than relying on the order of an arbitrary NLP package.



pip install preprocessing

PyPI - You can also download the source distribution from:

You can then perform:

pip install <path_to_tar_file>

on the tar file, or

python install

on/inside, respectively, the extracted package to install preprocessing.


Once you have the package installed, implementing it with Python3 takes the following form:

import preprocessing.text as ptext
from preprocessing.text import keyword_tokenize, remove_unbound_punct, remove_urls

text_string = "important string at:"

clean_string = ptext.preprocess_text(text_string, [
>>> print(clean_string)
"important string"

Should the functions be performed in a different order (i.e. keyword_tokenize -> remove_urls -> remove_non_bound_punct) :

>>> print(clean_string)
"important string http"


This package is comprised of a single module with no intended subpackages currently. The preprocessing package is dependent on NLTK for tokenizers and stopwords. However, ignoring this, the package only has built-in dependencies from Python 3.


If you feel like contributing:

  • Check for open issues or open a new issue
  • Fork the preprocessing repository to start making your changes
  • Write a test which shows the bug was fixed or that the feature works as expected
  • Send a pull request and remember to add yourself to


This project is licensed under the MIT license (see LICENSE)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for preprocessing, version 0.1.13
Filename, size File type Python version Upload date Hashes
Filename, size preprocessing-0.1.13-py3-none-any.whl (349.6 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size preprocessing-0.1.13.tar.gz (14.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page