Skip to main content

Match tabular address datasets together using address standardisation, rapidfuzz, and exact text matching via a Gradio GUI.

Project description

Installation

Requires Python 3.10 or newer.

Installing from pypi

Install the latest release from PyPI:

pip install fuzzy_address_matcher

This installation supports both Python-script usage and the GUI console command.

Use in a Python script

Import the matcher function:

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

Example input files

  • If you cloned the repo, the example CSVs are at example_data/.
  • If you installed from PyPI, the same example CSVs are bundled inside the installed package at fuzzy_address_matcher/example_data/ (and the GUI’s Load London example button will find them automatically).

1) Match using external CSV files

Pass file paths for your search dataset and reference dataset.

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
    in_file="example_data/search_addresses_london.csv",
    in_ref="example_data/reference_addresses_london.csv",
    in_colnames=["address_line_1", "address_line_2", "postcode"],
    in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
    in_joincol=None,
    output_folder="outputs",
)

print(final_summary)
print(output_files)
print(summary_table_md)

2) Match using DataFrames already loaded in Python

If your data is already in memory, pass DataFrames directly with search_df and ref_df.

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

# Assume search_df and ref_df already exist in your Python session.
final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
    search_df=search_df,
    ref_df=ref_df,
    in_colnames=["address_line_1", "address_line_2", "postcode"],
    in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
    in_joincol=None,
    output_folder="outputs",
)

print(final_summary)
print(output_files)
print(summary_table_md)

Run the GUI app

If you installed from PyPI, you can run the Gradio GUI via the console script:

fuzzy-address-matcher

Or, to run from source, clone the repo and run it from the project root:

git clone https://github.com/seanpedrick-case/fuzzy_address_matcher.git
cd fuzzy_address_matcher
pip install -e .
python app.py

Further details on use can be found in the User guide (GitHub Pages).

Introduction

Match single or multiple addresses to a reference / canonical dataset. The tool can accept CSV, XLSX (with one sheet), and Parquet files. After you have chosen a reference file, an address match file, and specified its address columns, click 'Match addresses' to run the tool.

Fuzzy matching should work on any address columns. If you have a postcode column, place this at the end of the list of address columns. If a postcode is not present in the address, the app will use street-only blocking. Ensure to untick the 'Use postcode blocker' checkbox to use street-only blocking. The final files will appear in the relevant output boxes, which you can download. Note that this app is based on UK address data.

Note that this app is based on UK address data. Matching is unlikely to be 100% accurate, so outputs should be checked by a human before further use.

Method

Address columns are concatenated together to form a single string address. Important details are extracted by regex (e.g. flat, house numbers, postcodes). Addresses may be 'standardised' in a number of ways; e.g. variations of words used for 'ground floor' such as 'grd' or 'grnd' are replaced with 'ground floor' to give a more consistent address wording. This has been found to increase match rates. Then the two datasets are compared with fuzzy matching. The closest fuzzy matches are selected, and then a post hoc test compares flat/property numbers to ensure a 'full match'.

Important note

I suggest that this app should be used in conjunction with the excellent uk_address_matcher package. I am finding that this package is great for ~95% of matches with uk addresses. However, the repo here (fuzzy_address_matcher) uses slightly different methods for matching (address standardisation, fuzzy matching), and so, as of April 2026, it can still pick up some new matches.

My suggested workflow would be:

  1. Match your datasets with the uk_address_matcher package, then
  2. Run the output file through this app for further address matches that can be picked up by the standardisation / fuzzy matching

Further details on use can be found in the User guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzy_address_matcher-2.2.0.tar.gz (126.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzy_address_matcher-2.2.0-py3-none-any.whl (126.7 kB view details)

Uploaded Python 3

File details

Details for the file fuzzy_address_matcher-2.2.0.tar.gz.

File metadata

  • Download URL: fuzzy_address_matcher-2.2.0.tar.gz
  • Upload date:
  • Size: 126.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for fuzzy_address_matcher-2.2.0.tar.gz
Algorithm Hash digest
SHA256 56550071e588d5880b1d5064187e5d8722e647fcb649825a02132876526324ef
MD5 98394e0a95249cc6a0483b3f8570ce1b
BLAKE2b-256 35942653d8b062af96481222e4cd8eaa3a532c6ddab85e2ce29c9c1fde681b23

See more details on using hashes here.

File details

Details for the file fuzzy_address_matcher-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fuzzy_address_matcher-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2e0ce007858c433342c8a39e8f0948806d49a50fec212cc045b74aadbae73786
MD5 7f6f66e4073d596a39ae7ee1aa066a16
BLAKE2b-256 686b74bff28a5ac414bac4e2d694be309545543432ce96c9e77e6eefc9e0fc8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page