Skip to main content

Match tabular address datasets together using address standardisation, rapidfuzz, and exact text matching via a Gradio GUI.

Project description

Installation

Requires Python 3.10 or newer.

Installing from pypi

Install the latest release from PyPI:

pip install fuzzy_address_matcher

This installation supports both Python-script usage and the GUI console command.

Use in a Python script

Import the matcher function:

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

Example input files

  • If you cloned the repo, the example CSVs are at example_data/.
  • If you installed from PyPI, the same example CSVs are bundled inside the installed package at fuzzy_address_matcher/example_data/ (and the GUI’s Load London example button will find them automatically).

1) Match using external CSV files

Pass file paths for your search dataset and reference dataset.

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
    in_file="example_data/search_addresses_london.csv",
    in_ref="example_data/reference_addresses_london.csv",
    in_colnames=["address_line_1", "address_line_2", "postcode"],
    in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
    in_joincol=None,
    output_folder="outputs",
)

print(final_summary)
print(output_files)
print(summary_table_md)

2) Match using DataFrames already loaded in Python

If your data is already in memory, pass DataFrames directly with search_df and ref_df.

from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match

# Assume search_df and ref_df already exist in your Python session.
final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
    search_df=search_df,
    ref_df=ref_df,
    in_colnames=["address_line_1", "address_line_2", "postcode"],
    in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
    in_joincol=None,
    output_folder="outputs",
)

print(final_summary)
print(output_files)
print(summary_table_md)

Run the GUI app

If you installed from PyPI, you can run the Gradio GUI via the console script:

fuzzy-address-matcher

Or, to run from source, clone the repo and run it from the project root:

git clone https://github.com/seanpedrick-case/fuzzy_address_matcher.git
cd fuzzy_address_matcher
pip install -e .
python app.py

Further details on use can be found in the User guide (GitHub Pages).

Introduction

Match single or multiple addresses to a reference / canonical dataset. The tool can accept CSV, XLSX (with one sheet), and Parquet files. After you have chosen a reference file, an address match file, and specified its address columns, click 'Match addresses' to run the tool.

Fuzzy matching should work on any address columns. If you have a postcode column, place this at the end of the list of address columns. If a postcode is not present in the address, the app will use street-only blocking. Ensure to untick the 'Use postcode blocker' checkbox to use street-only blocking. The final files will appear in the relevant output boxes, which you can download. Note that this app is based on UK address data.

Note that this app is based on UK address data. Matching is unlikely to be 100% accurate, so outputs should be checked by a human before further use.

Method

Address columns are concatenated together to form a single string address. Important details are extracted by regex (e.g. flat, house numbers, postcodes). Addresses may be 'standardised' in a number of ways; e.g. variations of words used for 'ground floor' such as 'grd' or 'grnd' are replaced with 'ground floor' to give a more consistent address wording. This has been found to increase match rates. Then the two datasets are compared with fuzzy matching. The closest fuzzy matches are selected, and then a post hoc test compares flat/property numbers to ensure a 'full match'.

Important note

I suggest that this app should be used in conjunction with the excellent uk_address_matcher package. I am finding that this package is great for ~95% of matches with uk addresses. However, the repo here (fuzzy_address_matcher) uses slightly different methods for matching (address standardisation, fuzzy matching), and so, as of April 2026, it can still pick up some new matches.

My suggested workflow would be:

  1. Match your datasets with the uk_address_matcher package, then
  2. Run the output file through this app for further address matches that can be picked up by the standardisation / fuzzy matching

Further details on use can be found in the User guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzy_address_matcher-2.1.1.tar.gz (115.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzy_address_matcher-2.1.1-py3-none-any.whl (117.2 kB view details)

Uploaded Python 3

File details

Details for the file fuzzy_address_matcher-2.1.1.tar.gz.

File metadata

  • Download URL: fuzzy_address_matcher-2.1.1.tar.gz
  • Upload date:
  • Size: 115.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for fuzzy_address_matcher-2.1.1.tar.gz
Algorithm Hash digest
SHA256 3ce08543fa35aa8b5e77e225b605cca4f4c442605feecff704dda12e1f1b75de
MD5 10190add6067ab3b9ad7368731f9a335
BLAKE2b-256 191ea35f776bf8cd5a131074e847799031e59e46429a9d581ae8c3ce8ce99568

See more details on using hashes here.

File details

Details for the file fuzzy_address_matcher-2.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fuzzy_address_matcher-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5917ba7e150ffb83486fc97f1212eabc0c2e829dac785d2598b49a137b21bb70
MD5 df0be7d2d870e0326d13c1d5b6a5608f
BLAKE2b-256 d2c7054d59f15dc058c8f6c701344c4833fe04827c3547ea8685213d4233de05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page