Text Preprocesser
Project description
Take Text Pre-Process
This package is a tool for pre-processing a sentence.
The basic functionality available in this packages are:
- Converting to lower case
- Remove non ascii characters
- Add space between punctuation and word
The customize functionality available are:
- Replace URL by a token
- Replace Email by a token
- Replace Numbers by a token
- Replace Code (Number and letters) by a token
- Remove symbols
- Replace abbreviations
Installation
The TakeTextPreProcess can be installed from PyPi:
pip install take-text-preprocess
Usage
Basic pre-process
To use the basic pre-process:
from take_text_preprocess.presentation import pre_process
sentence = 'Bom dia, meu ẞ caro'
pre_process(sentence)
Customize pre-process
To use the customize pre-process is needed a input with a list of all pre-process you wanted to use.
The following examples show all the customized pre-processes available.
- URL
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['URL']
sentence = 'Bom dia, meu https://www.take.net caro'
pre_process(sentence, optional_tokenization)
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['EMAIL']
sentence = 'Bom dia, meu teste@gmail.com caro'
pre_process(sentence, optional_tokenization)
- NUMBER
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['NUMBER']
sentence = 'Este é um número 99999-9999'
pre_process(sentence, optional_tokenization)
- CODE
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['CODE']
sentence = 'Este é um código 91234abc'
pre_process(sentence, optional_tokenization)
- SYMBOLS
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['SYMBOL']
sentence = 'Este é um sÃmbolo %'
pre_process(sentence, optional_tokenization)
- ABBREVIATIONS
from take_text_preprocess.presentation import pre_process
optional_tokenization = ['ABBR']
sentence = 'Este é uma abreviação vc'
pre_process(sentence, optional_tokenization)
Contribute
If this is the first time you are contributing to this project, first create the virtual environment using the following command:
conda env create -f env/environment.yml
Then activate the environment:
conda activate taketextpreprocess
To test your modifications build the package:
pip install dist\take-text-preprocess-VERSION-py3-none-any.whl --force-reinstall
Then run the tests:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file take-text-preprocess-0.0.5.tar.gz
.
File metadata
- Download URL: take-text-preprocess-0.0.5.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5efb5c359059c704ef210492ddea756c0490907c9f2f4d1aec9edb57449e9a04 |
|
MD5 | 6933fae94b1bc6ba1266809872ae04c2 |
|
BLAKE2b-256 | 1e3d591c74accc100509eff8d67431eafdb470a02c21ed3b7a3fb6825a9217aa |
File details
Details for the file take_text_preprocess-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: take_text_preprocess-0.0.5-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fe6858b65f8ae0a42aca62796ce62c3f64439aa3ad68632d0fc876a061ec769 |
|
MD5 | 87d6295bec46889725e3ca8943c65254 |
|
BLAKE2b-256 | ed9da6fc8af2684e74becc9f0dc9cd2f2e33e627e82532b97458623cdff691ee |