Skip to main content

Natural Language Processing in Rust with Python bidings

Project description

vtext

This is a Python wrapper for the Rust vtext crate.

This package aims to provide a high performance toolkit for ingesting textual data for machine learning applications.

The API is currently unstable.

Features

  • Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
  • Stemming: Snowball (in Python 15-20x faster than NLTK)
  • Token counting: converting token counts to sparse matrices for use in machine learning libraries. Similar to CountVectorizer and HashingVectorizer in scikit-learn but will less broad functionality.
  • Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities

Installation

vtext requires Python 3.5+, numpy 1.15+ and can be installed with,

pip install --pre vtext

Documentation

Project documentation: vtext.io/doc/latest/index.html

License

vtext is released under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

vtext-0.1.0a3-cp37-cp37m-win_amd64.whl (691.7 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

vtext-0.1.0a3-cp37-cp37m-manylinux1_x86_64.whl (4.1 MB view hashes)

Uploaded CPython 3.7m

vtext-0.1.0a3-cp37-cp37m-macosx_10_9_x86_64.whl (754.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

vtext-0.1.0a3-cp36-cp36m-win_amd64.whl (691.9 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

vtext-0.1.0a3-cp36-cp36m-manylinux1_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.6m

vtext-0.1.0a3-cp36-cp36m-macosx_10_7_x86_64.whl (754.4 kB view hashes)

Uploaded CPython 3.6m macOS 10.7+ x86-64

vtext-0.1.0a3-cp35-cp35m-win_amd64.whl (691.9 kB view hashes)

Uploaded CPython 3.5m Windows x86-64

vtext-0.1.0a3-cp35-cp35m-manylinux1_x86_64.whl (1.4 MB view hashes)

Uploaded CPython 3.5m

vtext-0.1.0a3-cp35-cp35m-macosx_10_6_x86_64.whl (754.7 kB view hashes)

Uploaded CPython 3.5m macOS 10.6+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page