Skip to main content

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

Project description

The author of this package has not provided a project description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-ucto-0.5.1.tar.gz (6.6 kB view details)

Uploaded Source

File details

Details for the file python-ucto-0.5.1.tar.gz.

File metadata

  • Download URL: python-ucto-0.5.1.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for python-ucto-0.5.1.tar.gz
Algorithm Hash digest
SHA256 4f80fc8e761d89eb0267927635e333ce90a612b9df9cfe200bd9a6cc10f45a91
MD5 ea40e99778934a8fa6884400d4f221c9
BLAKE2b-256 8a8d071c7b167fe332393f32d15e01bb7146c735f0bf70e6dd58c826727748b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page