Skip to main content

Tokenize an English sentence to phrases

Project description

Phrase Tokenizer

Code style: blackLicense: MITPyPI version

Tokenize an English sentence to phrases

Installation

pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update

Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git:

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow

Or use poetry, e.g.

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install

Usage

from phrase_tokenizer import phrase_tok

res = phrase_tok("Short cuts make long delay.")
print(res)
# ['Short cuts', 'make long delay']

# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delay", verbose=True)
# ',..Short.cuts,.make..long.delay..'

Consult the source code for details.

For Developers

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt

In jupyter notebook, plot_tree is able to draw a nice tree to aid the development, e.g.,

from phrase_tokenizer.phrase_tok import plot_tree

plot_tree("Short cuts make long delay.")

img

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phrase-tokenizer-0.1.2.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

phrase_tokenizer-0.1.2-py3-none-any.whl (4.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page