Skip to main content

Simple automated text preprocessor

Project description

## Natural Language Text Preprocessor (nltp)

A simplified package for automating text preprocessing activities such as lemmatization, tokenization, removal of stop words, removal of certain pattern from text using regular expression. Working under the hood, this package makes use of the NLTK library for its text cleaning activities.

Installation

Requirements:

  • Python 3.7 or higher
  • NLTK

Install latest release:

pip install nltp

Install from source:

git clone https://github.com/izzyx6/nltp.git
cd nltp
pip install .

Usage: the basics

Here's how to perform text cleaning with nltp

First, we pass text in a list to the instantiated Preprocessor object as it takes an argument text.

This lines of code returns a tokenized version of the text passed on instantiating the text Preprocessor .

from nltp import Preprocessor

text = ["I like eat delicious food", "That's I'm cooking food myself, case '10 Best Foods' helps lot, also 'Best Before (Shelf Life)'"]

output = Preprocessor(text)
output.token()

You can retrive the text with their index (default set to 0):

output.token(1)

Next, you can get the cleaned version of the text passed in a list with lemmatization, stop word and unwanted patterns in text removed.

Available parameters to modify are stop_words and patterns.

output = Preprocessor(text,stop_words = [USER DEFINED], pattern = [USER DEFINED])

Note: the purpose of having these parameters are to by pass the defualt parameters that remove non alphabets, repeted sequence of words, and users name (identified with the @User).

output = Preprocessor(text)
output.text_cleaner()

Note: Using the output. you can get the default stop word, patterns, and text passed

output = Preprocessor(text)
output.patterns
output.stop_words
output.text

Citation

BibTex entry:

@misc{omalley2019kerastuner,
	title        = {Natural Language Text Preprocessor {nltp}},
	author       = { Ufumaka Isreal},
	year         = 2020,
	howpublished = {\url{https://github.com/izzyx6/nltp}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nltp-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

File details

Details for the file nltp-0.1.0.tar.gz.

File metadata

  • Download URL: nltp-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0.post20200127 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6

File hashes

Hashes for nltp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dcc555fd7a265328e5ba05a5473254637628e6f917196f2cd1d908eb803ab283
MD5 df1c9576b8eb618b21cf2dd4ac2c77bf
BLAKE2b-256 2f4ec83a50edd05219e21d65948c42f61d94d744fdeecab583be54dba4b4d50a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page