Skip to main content

tokenizer tool

Project description

Easy-Tokenizer

Description

Most tokenizers are eithor too cumbersom (Neural Network based), or too simple. This simple rule based tokenizer is type, small, and sufficient good. Specially, it handles long strings very often parsed wrong by some simple tokenizers, deal url, email, long digits rather well.

Try with the following script: easy_tokenizer -s input_text

or

easy_tokenizer -f input_file

Status

todo

Requirements

Python 3.6+

Installation

pip install easy-tokenizer

Usage

todo

Development

To install package and its dependencies, run the following from project root directory:

python setup.py install

To work the code and develop the package, run the following from project root directory:

python setup.py develop

To run unit tests, execute the following from the project root directory:

python setup.py test

0.0.2 (2019-10-23)

  • support script to output result to a file, add documentation

0.0.1 (2019-10-22)

  • Add the first version of the tokenizer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_tokenizer-0.0.2.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

easy_tokenizer-0.0.2-py2.py3-none-any.whl (8.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file easy_tokenizer-0.0.2.tar.gz.

File metadata

  • Download URL: easy_tokenizer-0.0.2.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/38.4.0 requests-toolbelt/0.9.1 tqdm/4.23.4 CPython/3.6.4

File hashes

Hashes for easy_tokenizer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1a9ea9c3a58b0f05a18deab6077ac79206b225845a6836ae9b5cc98fa0c59e8e
MD5 7211d984fc40e2352a42f67db27a000e
BLAKE2b-256 4d08da9552b839a703dbc416ebe9c7a445736f8a2becae23fcce3fb2c92ac25f

See more details on using hashes here.

File details

Details for the file easy_tokenizer-0.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: easy_tokenizer-0.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/38.4.0 requests-toolbelt/0.9.1 tqdm/4.23.4 CPython/3.6.4

File hashes

Hashes for easy_tokenizer-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b658e9c2120c0ef6c4c401cf27412f8a31a28073ba13b23c6e143675d30fead2
MD5 edba49712ed88458561d5639117f54fa
BLAKE2b-256 736ddcb428441adc22a08fc99909479ae659717c3616be5e7635403c1443a54f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page