Skip to main content

Natural Language Processing in Rust with Python bidings

Project description

vtext

This is a Python wrapper for the Rust vtext crate.

This package aims to provide a high performance toolkit for ingesting textual data for machine learning applications.

Features

  • Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
  • Stemming: Snowball (in Python 15-20x faster than NLTK)
  • Token counting: converting token counts to sparse matrices for use in machine learning libraries. Similar to CountVectorizer and HashingVectorizer in scikit-learn but will less broad functionality.
  • Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities

Installation

vtext requires Python 3.6+, numpy 1.15+ and can be installed with,

pip install vtext

Documentation

Project documentation: vtext.io/doc/latest/index.html

License

vtext is released under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vtext, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size vtext-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl (830.0 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.2.0-cp36-cp36m-manylinux1_x86_64.whl (1.6 MB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.2.0-cp36-cp36m-win_amd64.whl (2.2 MB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size vtext-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl (829.9 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size vtext-0.2.0-cp37-cp37m-manylinux1_x86_64.whl (3.1 MB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size vtext-0.2.0-cp37-cp37m-win_amd64.whl (2.2 MB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size vtext-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl (830.2 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size vtext-0.2.0-cp38-cp38-manylinux1_x86_64.whl (4.7 MB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size vtext-0.2.0-cp38-cp38-win_amd64.whl (2.2 MB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size vtext-0.2.0.tar.gz (13.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page