Skip to main content

pre-processing package for text strings

Project description

Spotlight Data Logo

‘preprocessing’

Documentation Status

Summary

Text pre-processing package to aid in NLP package development for Python3. With this package you can order text cleaning functions in the order you prefer rather than relying on the order of an arbitrary NLP package.

Installation

pip:

pip install preprocessing

PyPI - You can also download the source distribution from:

https://pypi.python.org/pypi/preprocessing/

You can then perform:

pip install <path_to_tar_file>

on the tar file, or

python setup.py install

on/inside, respectively, the extracted package to install preprocessing.

Example

Once you have the package installed, implementing it with Python3 takes the following form:

import preprocessing.text as ptext
from preprocessing.text import keyword_tokenize, remove_unbound_punct, remove_urls

text_string = "important string at: http://example.com"

clean_string = ptext.preprocess_text(text_string, [
    remove_urls,
    remove_unbound_punct,
    keyword_tokenize
])
>>> print(clean_string)
"important string"

Should the functions be performed in a different order (i.e. keyword_tokenize -> remove_urls -> remove_non_bound_punct) :

>>> print(clean_string)
"important string http example.com"

Organisation

This package is comprised of a single module with no intended subpackages currently. The preprocessing package is dependent on NLTK for tokenizers and stopwords. However, ignoring this, the package only has built-in dependencies from Python 3.

Contributing

If you feel like contributing:

  • Check for open issues or open a new issue

  • Fork the preprocessing repository to start making your changes

  • Write a test which shows the bug was fixed or that the feature works as expected

  • Send a pull request and remember to add yourself to CONTRIBUTORS.md

License

This project is licensed under the MIT license (see LICENSE)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocessing-0.1.13.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

preprocessing-0.1.13-py3-none-any.whl (349.6 kB view details)

Uploaded Python 3

File details

Details for the file preprocessing-0.1.13.tar.gz.

File metadata

File hashes

Hashes for preprocessing-0.1.13.tar.gz
Algorithm Hash digest
SHA256 4c6ef9f4b94bf02664fc4c6bdc3814dfc17a94bbbde002f2a9113c91fdfe7f87
MD5 0e1a2b853c7f0e5312cf6c4af3ada664
BLAKE2b-256 e3ca102f0cb754c3dfdd095110711faa8566c66fa857fa0ffd2c3040ab2d8a81

See more details on using hashes here.

File details

Details for the file preprocessing-0.1.13-py3-none-any.whl.

File metadata

File hashes

Hashes for preprocessing-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 7323b9bd514f676019b3bd5d97360df0cc7262a58fb7eee6e80e87a1894c7f15
MD5 4fb36e168ef5d18fdeff3036bf75966a
BLAKE2b-256 79f9cadc71dbd774398e486f0608fb6746de36f562edf32fc59ebbe94a589c79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page