Tokenize an English sentence to phrases
Project description
Phrase Tokenizer
Tokenize an English sentence to phrases via benepar.
Installation
pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
# or to install the latest from github:
# pip git+https://github.com/ffreemt/phrase-tokenizer.git
Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git
:
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow
Or use poetry
, e.g.
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install
Usage
from phrase_tokenizer import phrase_tok
res = phrase_tok("Short cuts make long delays.")
print(res)
# ['Short cuts', 'make long delays']
# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delays", verbose=True)
# ',..Short.cuts,.make..long.delays..'
Consult the source code for details.
For Developers
git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt
In ipython
, plot_tree
is able to draw a nice tree to aid the development, e.g.,
from phrase_tokenizer.phrase_tok import plot_tree
plot_tree("Short cuts make long delays.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file phrase-tokenizer-0.1.3.tar.gz
.
File metadata
- Download URL: phrase-tokenizer-0.1.3.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3da2e6557661a9248d7782ef73eae55afec51ca3c7ac1bab16b7b73beb6ed048 |
|
MD5 | d59192e422f7f1f1192baa99a034e972 |
|
BLAKE2b-256 | d36a19fe47fa5bd8811f6e940c0636acd0c8d8a2ded593f8528c28c35e94e143 |
File details
Details for the file phrase_tokenizer-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: phrase_tokenizer-0.1.3-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f5c0dc455d34c2d5beb35086d36f9f7d60bc2d34cc17807b7b74712ed18762b |
|
MD5 | bda0d55d33e06a4c625ff499b79a3c27 |
|
BLAKE2b-256 | 814bb470ff9c53414d052a24c6a2afb378b24720f42ad2d9bb04ae531847a90f |