Skip to main content

Join two tables by a fuzzy comparison of text columns.

Project description

fuzzyjoin

Join two tables by a fuzzy comparison of text columns.

Features

Installation

  • Pure python: pip install fuzzyjoin

  • Optimized: pip install fuzzyjoin[fast]

Description

The goal of this package is to provide a quick and convenient way to join two tables on a pair of text columns, which often contain variations of names for the same entity. fuzzyjoin satisfies the simple and common case of joining by a single column from each table for datasets in the thousands of records.

For a more sophisticated and comprehensive treatment of the topic that will allow you to join records using multiple fields, see the packages below:

TODO

  • Test transformation and exclude functions.

  • Implement left join and full join.

  • Check that the ID is actually unique.

  • Add documentation.

  • Option to rename headers and disambiguate duplicate header names.

History

0.3.4 (2019-04-11)

  • Fix function defaults.

  • Minor optimizations.

  • Additional CLI parameters.

0.3.3 (2019-04-10)

  • Cleanup checks.

0.3.2 (2019-04-10)

  • Include basic installation instructions.

0.3.1 (2019-04-10)

  • Minor README updates.

0.3.0 (2019-04-10)

  • Use editdistance if available, otherwise fallback to pylev.

  • Report progress by default.

  • Number comparison options.

  • Renamed get_multiples to filter_multiples.

0.2.1 (2019-04-10)

  • Additional docs and tests.

0.2.0 (2019-04-09)

  • Write multiples matches to a separate file.

  • Added types and docstrings.

0.1.2 (2019-04-09)

  • Duplicate release of 0.1.1

0.1.1 (2019-04-09)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzyjoin-0.4.1.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

fuzzyjoin-0.4.1-py2.py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page