Tokenize an English sentence to phrases
Project description
Phrase Tokenizer
Tokenize an English sentence to phrases
Installation
pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git
:
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow
Or use poetry
, e.g.
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install
Usage
from phrase_tokenizer import phrase_tok
res = phrase_tok("Short cuts make long delay.")
print(res)
# ['Short cuts', 'make long delay']
# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delay", verbose=True)
# ',..Short.cuts,.make..long.delay..'
Consult the source code for details.
For Developers
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt
In jupyter notebook
, plot_tree
is able to draw a nice tree to aid the development, e.g.,
from phrase_tokenizer.phrase_tok import plot_tree
plot_tree("Short cuts make long delay.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
phrase-tokenizer-0.1.2.tar.gz
(4.2 kB
view hashes)
Built Distribution
Close
Hashes for phrase_tokenizer-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cf561b8397b6012196ae9b8d7ebf6c9701945f78f8ff6948c7627c43f058a01 |
|
MD5 | d73b33616a63f0898f2ac00f5efcfbca |
|
BLAKE2b-256 | cd98db2f60ae4376119676e7639f40f20cae3194eea51ccb6c30abdfdb4d4861 |