This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).
The author of this package has not provided a project description
Release history Release notifications
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size python-ucto-0.5.1.tar.gz (6.6 kB)||File type Source||Python version None||Upload date||Hashes View|