Skip to main content

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

Project description

The author of this package has not provided a project description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for python-ucto, version 0.5.1
Filename, size File type Python version Upload date Hashes
Filename, size python-ucto-0.5.1.tar.gz (6.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page