Skip to main content

TakeSentenceTokenizer is a tool for tokenizing and pre processing messages

Project description


TakeSentenceTokenizer is a tool for pre processing and tokenizing sentences. The package is used: - to convert the first word of the sentence to lower case - replace words for placeholders: laugh, date, time, ddd, measures (10kg, 20m, 5gb, etc), code, phone number, cnpj, cpf, email, money, url, number (ordinal and cardinal) - remove emoji - add accentuation - tokenize the sentence


Use the package manager pip to install TakeSentenceTokenizer

pip install TakeSentenceTokenizer


import SentenceTokenizer as st
sentence = 'nao consigo fazer o cadastro!!'
tokenizer = st.SentenceTokenizer()
processed_sentence = tokenizer.process_message(sentence)


Karina Tiemi Kato



Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for TakeSentenceTokenizer, version 0.4
Filename, size File type Python version Upload date Hashes
Filename, size TakeSentenceTokenizer-0.4-py3-none-any.whl (401.6 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size TakeSentenceTokenizer-0.4.tar.gz (3.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page