Skip to main content

Join two tables by a fuzzy comparison of text columns.

Project description

fuzzyjoin

Join two tables by a fuzzy comparison of text columns.

Features

Description

The goal of this package is to provide a quick and convenient way to join two tables on a pair of text columns, which often contain variations of names for the same entity. fuzzyjoin satisfies the simple and common case of joining by a single column from each table for a small to medium-sized dataset.

For more sophisticated and comprehensive treatments of the topic that will allow you to join records using multiple fields, see the packages below:

[dedupe](https://github.com/dedupeio/dedupe) [recordlinkage](https://recordlinkage.readthedocs.io/en/latest/about.html)

TODO

  • [ ] Test transformation and exclude functions.

  • [ ] Implement left join and full join.

  • [ ] Optionally use python-Levenshtein for speed.

  • [ ] Check that the ID is actually unique.

  • [ ] Add documentation.

  • [ ] Option to rename headers and disambiguate duplicate header names.

History

0.2.1 (2019-04-10)

  • Additional docs and tests.

0.2.0 (2019-04-09)

  • Write multiples matches to a separate file.

  • Added types and docstrings.

0.1.2 (2019-04-09)

  • Duplicate release of 0.1.1

0.1.1 (2019-04-09)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzyjoin-0.2.1.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

fuzzyjoin-0.2.1-py2.py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page