Tokenize an English sentence to phrases
Project description
Phrase Tokenizer
Tokenize an English sentence to phrases via benepar.
Installation
pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
# or to install the latest from github:
# pip git+https://github.com/ffreemt/phrase-tokenizer.git
Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git
:
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow
Or use poetry
, e.g.
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install
Usage
from phrase_tokenizer import phrase_tok
res = phrase_tok("Short cuts make long delays.")
print(res)
# ['Short cuts', 'make long delays']
# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delays", verbose=True)
# ',..Short.cuts,.make..long.delays..'
Consult the source code for details.
For Developers
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt
In ipython
, plot_tree
is able to draw a nice tree to aid the development, e.g.,
from phrase_tokenizer.phrase_tok import plot_tree
plot_tree("Short cuts make long delays.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
phrase-tokenizer-0.1.3.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for phrase_tokenizer-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f5c0dc455d34c2d5beb35086d36f9f7d60bc2d34cc17807b7b74712ed18762b |
|
MD5 | bda0d55d33e06a4c625ff499b79a3c27 |
|
BLAKE2b-256 | 814bb470ff9c53414d052a24c6a2afb378b24720f42ad2d9bb04ae531847a90f |