This package contains preprocessing functions
Project description
NLPPREPROCESS
NLPPREPROCESS is a preprocessing package for NLP task. The main objective of the package is to reduce time consumed for preprocessing by using ready made functions.
Requirements
- Python 3.4 or higher
Installation
Using PIP via PyPI
$ pip install nlppreprocess
Manually via GIT
$ git clone git://github.com/gaganmanku96/nlppreprocess
$ cd nlppreprocess
$ python setup.py install
Functionalities
- Replaces words
- Remove stopwords
- Remove numbers
- Remove HTML tags
- Remove punctations
- Lemmatize words either by Wordnet or Snowball
Usage
>>> from nlpuitls import NLP
>>> obj = NLP()
Parameters
>>> obj = NLP(
replace_words=True,
remove_stopwords=True,
remove_numbers=True,
remove_HTML_tags=True,
remove_punctation=True,
lemmatize=False,
lemmatize_method='wordnet'
)
Using with Pandas Library
>>> dataFrame['text'] = dataFrame['text].apply(obj.process)
Using with plain textx
>>> print(obj.process("Pass a text here"))
Add more stopwords
>>> obj = NLP()
>>> obj.add_stopword(['this', 'and this'])
Add more replace words
>>> obj = NLP()
>>> obj.add_replacement([this="by this", this="by this"])
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file nlppreprocess-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: nlppreprocess-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3eacb1bab2d240d03083d85cedf629a6aafe5b526f6ced3d3f8061bb4bd0a93 |
|
MD5 | f03ade7b659e291ff51dbdce6b6aea0a |
|
BLAKE2b-256 | 668d3a0584b924248c865a8e7ee04a93175551ebcaf156ee9b73346cd62446e6 |