Natural Language Processing in Rust with Python bidings
Project description
vtext
This is a Python wrapper for the Rust vtext crate.
This package aims to provide a high performance toolkit for ingesting textual data for machine learning applications.
Features
- Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
- Stemming: Snowball (in Python 15-20x faster than NLTK)
- Token counting: converting token counts to sparse matrices for use
in machine learning libraries. Similar to
CountVectorizer
andHashingVectorizer
in scikit-learn but will less broad functionality. - Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities
Installation
vtext requires Python 3.6+, numpy 1.15+ and can be installed with,
pip install vtext
Documentation
Project documentation: vtext.io/doc/latest/index.html
License
vtext is released under the Apache License, Version 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vtext-0.2.0.tar.gz
(13.6 kB
view hashes)
Built Distributions
Close
Hashes for vtext-0.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02c1dbefd2b6fd3522a96a9bd8f8e85ae4722ee088e2d952bbec830b0e88727c |
|
MD5 | 3352ada8cb130e7ecc8585c9ac240d3c |
|
BLAKE2b-256 | 544f3394633ea154167c5e4421e890bd8e8012083b51505282f55435d98d866a |
Close
Hashes for vtext-0.2.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb37f4b72cf754ff20323f11519da9d3864c7f0a428be847da2ed55a3665cc44 |
|
MD5 | 8fd0083af90bb388ac8d37733a1ea7ad |
|
BLAKE2b-256 | e2fc63268c97659af2e20150a2a73e5e59a4afea825f749315c5d7c7cf23fef2 |
Close
Hashes for vtext-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7a7826a44b81e9d1779bc800a5ee133647c7943c52b434ae8415df18933f77f |
|
MD5 | 9d1c5b2e3041e30c8adbb5febf973cb4 |
|
BLAKE2b-256 | e63c606490a9e266bc5a4a77562b1f33952aec02dd56d8de1ccdaa71fb3f62b1 |
Close
Hashes for vtext-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fa5b18b31637ce012fdfddb1c6a207989320bcf246d5f131695c9fc92b2a32c |
|
MD5 | 9a2fbc26ef1fa2aaf57a70b7451964e5 |
|
BLAKE2b-256 | 296f39e44b32ecf6c8dbc10351285579c6c21ade363992a305304e3c3fb6dbd2 |
Close
Hashes for vtext-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be3d75845af06d92af9fb65dde8c37ea890f8ed00bb236884fe3b8e2c4b08e32 |
|
MD5 | d666fe366376d9e79fafda16ba732ebc |
|
BLAKE2b-256 | 1776c816375ba0d52ccca437cfd822d6e14a5a5b5a10414f5bbaadf07808d41d |
Close
Hashes for vtext-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 397823cda22d04de43312e27cbe74be4318c20ec2ef38df9c66493580be06ec8 |
|
MD5 | cc077a7243666557a85b48e848057e98 |
|
BLAKE2b-256 | 221a72764efdd9ed3d32295d0dddf7ed8500b32ab4ced7c39b7f8bd4936e1fb6 |
Close
Hashes for vtext-0.2.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c54d2b4496afa0d8687345b2b89bed7e9aa03b223f0dc58ac923348d0f879a2c |
|
MD5 | 2631ccf87c8ff2fb81ae392cc3f828fd |
|
BLAKE2b-256 | be8a97cdac102035d0117a1a4c6e9e1c03551c9f42feb58768b4caf39f3b4e17 |
Close
Hashes for vtext-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ddde99b3153f7bf439b06f69f221c59945b1ce103368ce3a4957e7112ab904b |
|
MD5 | 456e32b2a9f8fcbe4f1b6ad95fe1ce3c |
|
BLAKE2b-256 | 92dcd832b8d986bb1d9be6b7586ba0ce03b0444bbb7fa995092db04159d053c0 |
Close
Hashes for vtext-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1791aad4a999525a7c19ae25ffdeb491839e81e958995567151a3bf8012c32ff |
|
MD5 | aef53e618c8de5561fdcaa3618adb88e |
|
BLAKE2b-256 | a2fbdecc22acef0fed05c8680650487af7e500bddf4091c1d0cfe767eb4dd7eb |