Match tabular address datasets together using address standardisation, rapidfuzz, and exact text matching via a Gradio GUI.
Project description
Installation
Requires Python 3.10 or newer.
Installing from pypi
Install the latest release from PyPI:
pip install fuzzy_address_matcher
This installation supports both Python-script usage and the GUI console command.
Use in a Python script
Import the matcher function:
from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match
Example input files
- If you cloned the repo, the example CSVs are at
example_data/. - If you installed from PyPI, the same example CSVs are bundled inside the installed package at
fuzzy_address_matcher/example_data/(and the GUI’s Load London example button will find them automatically).
1) Match using external CSV files
Pass file paths for your search dataset and reference dataset.
from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match
final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
in_file="example_data/search_addresses_london.csv",
in_ref="example_data/reference_addresses_london.csv",
in_colnames=["address_line_1", "address_line_2", "postcode"],
in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
in_joincol=None,
output_folder="outputs",
)
print(final_summary)
print(output_files)
print(summary_table_md)
2) Match using DataFrames already loaded in Python
If your data is already in memory, pass DataFrames directly with search_df and ref_df.
from fuzzy_address_matcher.matcher_funcs import fuzzy_address_match
# Assume search_df and ref_df already exist in your Python session.
final_summary, output_files, estimated_seconds, summary_table_md = fuzzy_address_match(
search_df=search_df,
ref_df=ref_df,
in_colnames=["address_line_1", "address_line_2", "postcode"],
in_refcol=["addr1", "addr2", "addr3", "addr4", "postcode"],
in_joincol=None,
output_folder="outputs",
)
print(final_summary)
print(output_files)
print(summary_table_md)
Run the GUI app
If you installed from PyPI, you can run the Gradio GUI via the console script:
fuzzy-address-matcher
Or, to run from source, clone the repo and run it from the project root:
git clone https://github.com/seanpedrick-case/fuzzy_address_matcher.git
cd fuzzy_address_matcher
pip install -e .
python app.py
Further details on use can be found in the User guide (GitHub Pages).
Introduction
Match single or multiple addresses to a reference / canonical dataset. The tool can accept CSV, XLSX (with one sheet), and Parquet files. After you have chosen a reference file, an address match file, and specified its address columns, click 'Match addresses' to run the tool.
Fuzzy matching should work on any address columns. If you have a postcode column, place this at the end of the list of address columns. If a postcode is not present in the address, the app will use street-only blocking. Ensure to untick the 'Use postcode blocker' checkbox to use street-only blocking. The final files will appear in the relevant output boxes, which you can download. Note that this app is based on UK address data.
Note that this app is based on UK address data. Matching is unlikely to be 100% accurate, so outputs should be checked by a human before further use.
Method
Address columns are concatenated together to form a single string address. Important details are extracted by regex (e.g. flat, house numbers, postcodes). Addresses may be 'standardised' in a number of ways; e.g. variations of words used for 'ground floor' such as 'grd' or 'grnd' are replaced with 'ground floor' to give a more consistent address wording. This has been found to increase match rates. Then the two datasets are compared with fuzzy matching. The closest fuzzy matches are selected, and then a post hoc test compares flat/property numbers to ensure a 'full match'.
Important note
I suggest that this app should be used in conjunction with the excellent uk_address_matcher package. I am finding that this package is great for ~95% of matches with uk addresses. However, the repo here (fuzzy_address_matcher) uses slightly different methods for matching (address standardisation, fuzzy matching), and so, as of April 2026, it can still pick up some new matches.
My suggested workflow would be:
- Match your datasets with the uk_address_matcher package, then
- Run the output file through this app for further address matches that can be picked up by the standardisation / fuzzy matching
Further details on use can be found in the User guide.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzy_address_matcher-2.2.0.tar.gz.
File metadata
- Download URL: fuzzy_address_matcher-2.2.0.tar.gz
- Upload date:
- Size: 126.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56550071e588d5880b1d5064187e5d8722e647fcb649825a02132876526324ef
|
|
| MD5 |
98394e0a95249cc6a0483b3f8570ce1b
|
|
| BLAKE2b-256 |
35942653d8b062af96481222e4cd8eaa3a532c6ddab85e2ce29c9c1fde681b23
|
File details
Details for the file fuzzy_address_matcher-2.2.0-py3-none-any.whl.
File metadata
- Download URL: fuzzy_address_matcher-2.2.0-py3-none-any.whl
- Upload date:
- Size: 126.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e0ce007858c433342c8a39e8f0948806d49a50fec212cc045b74aadbae73786
|
|
| MD5 |
7f6f66e4073d596a39ae7ee1aa066a16
|
|
| BLAKE2b-256 |
686b74bff28a5ac414bac4e2d694be309545543432ce96c9e77e6eefc9e0fc8d
|