Skip to main content

Natural Language Processing in Rust with Python bidings

Project description

vtext

This is a Python wrapper for the Rust vtext crate.

This package aims to provide a high performance toolkit for ingesting textual data for machine learning applications.

The API is currently unstable.

Features

  • Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
  • Stemming: Snowball (in Python 15-20x faster than NLTK)
  • Token counting: converting token counts to sparse matrices for use in machine learning libraries. Similar to CountVectorizer and HashingVectorizer in scikit-learn but will less broad functionality.
  • Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities

Installation

vtext requires Python 3.5+, numpy 1.15+ and can be installed with,

pip install --pre vtext

Documentation

Project documentation: vtext.io/doc/latest/index.html

License

vtext is released under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vtext, version 0.1.0a3
Filename, size File type Python version Upload date Hashes
Filename, size vtext-0.1.0a3-cp35-cp35m-macosx_10_6_x86_64.whl (754.7 kB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp35-cp35m-manylinux1_x86_64.whl (1.4 MB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp35-cp35m-win_amd64.whl (691.9 kB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp36-cp36m-macosx_10_7_x86_64.whl (754.4 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp36-cp36m-manylinux1_x86_64.whl (2.8 MB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp36-cp36m-win_amd64.whl (691.9 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp37-cp37m-macosx_10_9_x86_64.whl (754.3 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp37-cp37m-manylinux1_x86_64.whl (4.1 MB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size vtext-0.1.0a3-cp37-cp37m-win_amd64.whl (691.7 kB) File type Wheel Python version cp37 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page