Skip to main content

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

Project description

The author of this package has not provided a project description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-ucto-0.4.2.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file python-ucto-0.4.2.tar.gz.

File metadata

  • Download URL: python-ucto-0.4.2.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for python-ucto-0.4.2.tar.gz
Algorithm Hash digest
SHA256 011052bae4dd7080a943963f26e3fe487722f1be793e2e2fd1787378de6f998a
MD5 fca22674f4d41ff49ea02f36f28c32f5
BLAKE2b-256 017fb45ae3463198c664f023cee979b1017a46339c4a49620a22fad5a598f5ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page