Preprocessing Library for Natural Language Processing
Project description
PreNLP
Preprocessing Library for Natural Language Processing
Installation
Requirements
- Python >= 3.6
- Mecab morphological analyzer for Korean
sh scripts/install_mecab.sh
With pip
prenlp can be installed using pip as follows:
pip install prenlp
Usage
Data
Dataset Loading
Popular datasets for NLP tasks are provided in prenlp.
- Text Classification: IMDB, NSMC
General use cases (for IMDB) are as follows:
>>> imdb_train, imdb_test = prenlp.data.IMDB()
>>> len(imdb_train), len(imdb_test)
25000 25000
>>> imdb_train[0]
("Minor Spoilers<br /><br />Alison Parker (Cristina Raines) is a successful top model, living with the lawyer Michael Lerman (Chris Sarandon) in his apartment. She tried to commit ...", 'pos')
Normalization
Frequently used normalization functions for text pre-processing are provided in prenlp.
url, HTML tag, emoticon, email, phone number, etc.
General use cases (for Moses tokenizer) are as follows:
>>> from prenlp.data import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize('Visit this link for more details: https://github.com/')
Visit this link for more details: [URL]
>>> normalizer.normalize('Use HTML with the desired attributes: <img src="cat.jpg" height="100" />')
Use HTML with the desired attributes: [TAG]
>>> normalizer.normalize('Hello 🤩, I love you 💓 !')
Hello [EMOJI], I love you [EMOJI] !
>>> normalizer.normalize('Contact me at lyeoni.g@gmail.com')
Contact me at [EMAIL]
>>> normalizer.normalize('Call +82 10-1234-5678')
Call [TEL]
Tokenizer
Frequently used tokenizers for text pre-processing are provided in prenlp.
NLTKMosesTokenizer
General use cases (for Moses tokenizer) are as follows:
>>> from prenlp.tokenizer import NLTKMosesTokenizer
>>> tokenizer = NLTKMosesTokenizer()
>>> tokenizer('PreNLP package provides a variety of text preprocessing tools.')
['PreNLP', 'package', 'provides', 'a', 'variety', 'of', 'text', 'preprocessing', 'tools', '.']
Author
- Hoyeon Lee @lyeoni
- email : lyeoni.g@gmail.com
- facebook : https://www.facebook.com/lyeoni.f
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
prenlp-0.0.5-py3-none-any.whl
(35.1 kB
view hashes)