Skip to main content

Natural Language Processing in Rust with Python bidings

Project description

vtext

This is a Python wrapper for the Rust vtext crate.

This package aims to provide a high performance toolkit for ingesting textual data for machine learning applications.

Features

  • Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
  • Stemming: Snowball (in Python 15-20x faster than NLTK)
  • Token counting: converting token counts to sparse matrices for use in machine learning libraries. Similar to CountVectorizer and HashingVectorizer in scikit-learn but will less broad functionality.
  • Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities

Installation

vtext requires Python 3.6+, numpy 1.15+ and can be installed with,

pip install vtext

Documentation

Project documentation: vtext.io/doc/latest/index.html

License

vtext is released under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vtext-0.2.0.tar.gz (13.6 kB view hashes)

Uploaded source

Built Distributions

vtext-0.2.0-cp38-cp38-win_amd64.whl (2.2 MB view hashes)

Uploaded cp38

vtext-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl (830.2 kB view hashes)

Uploaded cp38

vtext-0.2.0-cp37-cp37m-win_amd64.whl (2.2 MB view hashes)

Uploaded cp37

vtext-0.2.0-cp36-cp36m-win_amd64.whl (2.2 MB view hashes)

Uploaded cp36

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page