Skip to main content

Python library for performing string similarity joins.

Project description

py_stringsimjoin

This project seeks to build a Python software package that provides scalable implementation of string similarity joins over two tables, for commonly used similarity measures such as Jaccard, Dice, cosine, overlap, overlap coefficient and edit distance. The package is free, open-source, and BSD-licensed.

Dependencies

py_stringsimjoin has been tested on Python 2.7, Python 3.3, Python 3.4 and Python 3.5.

The required dependencies to build the package are pandas 0.16.0 or higher, py_stringmatching 0.2.1 or higher, joblib, pyprind and six.

Platforms

py_stringsimjoin has been tested on Linux, OS X and Windows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

py_stringsimjoin-0.1.0.zip (409.2 kB view details)

Uploaded Source

py_stringsimjoin-0.1.0.tar.gz (352.6 kB view details)

Uploaded Source

File details

Details for the file py_stringsimjoin-0.1.0.zip.

File metadata

  • Download URL: py_stringsimjoin-0.1.0.zip
  • Upload date:
  • Size: 409.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for py_stringsimjoin-0.1.0.zip
Algorithm Hash digest
SHA256 264d115925426cdca71fc76a9f350ee29244cc282882d83443bac0e0115da505
MD5 d26b9656343e5a228bf852c00b69e540
BLAKE2b-256 73c4af84a7b924ff12ddf4ee49d45e0383e13a6e03b552c2a369844414dd1970

See more details on using hashes here.

File details

Details for the file py_stringsimjoin-0.1.0.tar.gz.

File metadata

File hashes

Hashes for py_stringsimjoin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b2f66304d40dcea8ff8ef7681282820aad49918990fb74183e94337109ca1e7
MD5 a4898b1555f83211bb07407088ce8db9
BLAKE2b-256 e9d23191735fd23e0b92da17e7a4a4de9e2c9d220eec2ae0fdc55168f46a9c3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page