This project seeks to build a Python software package that provides scalable implementation of string similarity joins over two tables, for commonly used similarity measures such as Jaccard, Dice, cosine, overlap, overlap coefficient and edit distance. The package is free, open-source, and BSD-licensed.
py_stringsimjoin has been tested on Python 2.7, Python 3.3, Python 3.4 and Python 3.5.
The required dependencies to build the package are pandas 0.16.0 or higher, py_stringmatching 0.2.1 or higher, joblib, pyprind and six.
py_stringsimjoin has been tested on Linux, OS X and Windows.
TODO: Figure out how to actually get changelog content.
Changelog content for this version goes here.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|py_stringsimjoin-0.1.0.tar.gz (352.6 kB) Copy SHA256 Checksum SHA256||–||Source||Jul 15, 2016|
|py_stringsimjoin-0.1.0.zip (409.2 kB) Copy SHA256 Checksum SHA256||–||Source||Jul 15, 2016|