Skip to main content

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

Project description

The author of this package has not provided a project description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-ucto-0.4.3.tar.gz (4.0 kB view details)

Uploaded Source

File details

Details for the file python-ucto-0.4.3.tar.gz.

File metadata

  • Download URL: python-ucto-0.4.3.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for python-ucto-0.4.3.tar.gz
Algorithm Hash digest
SHA256 27b6d5605b31348aec02c83f533b416eebc5f7f39326582c13ce4a029741a8c6
MD5 724c3cc2f6a1043c1c1d4e794e654e04
BLAKE2b-256 334ba7f908fcbc0510a7fc788ace6b284edfd6cd04bb4a54b0d6fbad2e816d45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page