Skip to main content

TakeSentenceTokenizer is a tool for tokenizing and pre processing messages

Project description

TakeSentenceTokenizer

TakeSentenceTokenizer is a tool for pre processing and tokenizing sentences. The package is used: - to convert the first word of the sentence to lower case - replace words for placeholders: laugh, date, time, ddd, measures (10kg, 20m, 5gb, etc), code, phone number, cnpj, cpf, email, money, url, number (ordinal and cardinal) - remove emoji - add accentuation - tokenize the sentence

Installation

Use the package manager pip to install TakeSentenceTokenizer

pip install TakeSentenceTokenizer

Usage

import SentenceTokenizer as st
sentence = 'nao consigo fazer o cadastro!!'
tokenizer = st.SentenceTokenizer()
processed_sentence = tokenizer.process_message(sentence)
print(processed_sentence)

Author

Karina Tiemi Kato

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for TakeSentenceTokenizer, version 0.4
Filename, size File type Python version Upload date Hashes
Filename, size TakeSentenceTokenizer-0.4-py3-none-any.whl (401.6 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size TakeSentenceTokenizer-0.4.tar.gz (3.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page