Skip to main content

Tokenize an English sentence to phrases

Project description

Phrase Tokenizer

pytestpythonCodacy BadgeCode style: blackdocstyle: googleLicense: MIT PyPI version

Tokenize an English sentence to phrases via benepar.

Installation

pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
# or to install the latest from github:
# pip git+https://github.com/ffreemt/phrase-tokenizer.git

Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git:

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow

Or use poetry, e.g.

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install

Usage

from phrase_tokenizer import phrase_tok

res = phrase_tok("Short cuts make long delays.")
print(res)
# ['Short cuts', 'make long delays']

# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delays", verbose=True)
# ',..Short.cuts,.make..long.delays..'

Consult the source code for details.

For Developers

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt

In ipython, plot_tree is able to draw a nice tree to aid the development, e.g.,

from phrase_tokenizer.phrase_tok import plot_tree

plot_tree("Short cuts make long delays.")

img

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phrase-tokenizer-0.1.3.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

phrase_tokenizer-0.1.3-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file phrase-tokenizer-0.1.3.tar.gz.

File metadata

  • Download URL: phrase-tokenizer-0.1.3.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.5 Windows/10

File hashes

Hashes for phrase-tokenizer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3da2e6557661a9248d7782ef73eae55afec51ca3c7ac1bab16b7b73beb6ed048
MD5 d59192e422f7f1f1192baa99a034e972
BLAKE2b-256 d36a19fe47fa5bd8811f6e940c0636acd0c8d8a2ded593f8528c28c35e94e143

See more details on using hashes here.

File details

Details for the file phrase_tokenizer-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for phrase_tokenizer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5f5c0dc455d34c2d5beb35086d36f9f7d60bc2d34cc17807b7b74712ed18762b
MD5 bda0d55d33e06a4c625ff499b79a3c27
BLAKE2b-256 814bb470ff9c53414d052a24c6a2afb378b24720f42ad2d9bb04ae531847a90f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page