Tokenize an English sentence to phrases
Project description
Phrase Tokenizer
Tokenize an English sentence to phrases
Installation
pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git
:
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow
Or use poetry
, e.g.
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install
Usage
from phrase_tokenizer import phrase_tok
res = phrase_tok("Short cuts make long delay.")
print(res)
# ['Short cuts', 'make long delay']
# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delay", verbose=True)
# ',..Short.cuts,.make..long.delay..'
Consult the source code for details.
For Developers
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt
In jupyter notebook
, plot_tree
is able to draw a nice tree to aid the development, e.g.,
from phrase_tokenizer.phrase_tok import plot_tree
plot_tree("Short cuts make long delay.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
phrase-tokenizer-0.1.1.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for phrase_tokenizer-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65c3cdbd93484bacd8da0f04e28453671132f6590d6c52e85dd4690cb16fd067 |
|
MD5 | dddf451c0abeae863c4d044e250df3ed |
|
BLAKE2b-256 | aa72d90ec4a6550c84a688edbfff332080bbebad1f0fafa4761b9b55440de692 |