Skip to main content

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

Project description

The author of this package has not provided a project description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-ucto-0.5.0.tar.gz (6.6 kB view details)

Uploaded Source

File details

Details for the file python-ucto-0.5.0.tar.gz.

File metadata

  • Download URL: python-ucto-0.5.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for python-ucto-0.5.0.tar.gz
Algorithm Hash digest
SHA256 639ef19c473abf83b43849e71630a8a835365457f43cc78f2cd9f83af2fc5f8d
MD5 89ea28c675f73dc01bfcf380736cb12b
BLAKE2b-256 e4b0b5d6b601afd8521272a338d5b54ba9b9bdbe06d0e43d59c18aa4d4f11af0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page