A package for matching UK addresses using a pretrained Splink model
Project description
High performance UK addresses matcher (geocoder)
Fast, simple address matching (geocoding) in Python.
For full documentation, see our main documentation site.
Why use this library
- Simple. Setup in seconds, runs on a laptop. No separate infrastructure of services needed.
- Fast. Match 100,000 addresses in ~30 seconds.
- Proven accuracy. We use public, labelled datasets to measure and document accuracy.
- Support for Ordnance Survey data. We provide a automated build pipeline for users wishing to match to Ordnance Survey data. Matching to any other canonical dataset is also supported.
The end-to-end process of matching 100,000 addresses to Ordnance Survey data, including all software downloads and data processing takes:
- Less than a minute if you are matching to a small area such as a local council region.
- If matching to the whole UK, there's a one-time preprocessing step that takes around 10 minutes. Subsequent matching of 100k records takes less than a minute.
Installation
pip install uk_address_matcher
What does it do?
Given the following data:
- a "messy" dataset of addresses that you want to match
- a "canonical" dataset of known addresses, often an Ordnance Survey dataset such as AddressBase or NGD.
this package will find the best matching canonical address for each messy address.
Example:
Your address files need, at minimum, two columns: unique_id and address_concat.
postcode is optional by recommended. If not provided an attempt is made to parse them out of address_concat
Given the following data:
Messy data
| unique_id | address_concat | postcode |
|---|---|---|
| m_1 | Flat A Example Court, 10 Demo Road, Townton | AB1 2BC |
| ...more rows |
Canonical data
| unique_id | address_concat | postcode |
|---|---|---|
| c_1 | Flat A, 10 Demo Road, Townton | AB1 2BC |
| c_2 | Flat B, 10 Demo Road, Townton | AB1 2BC |
| c_3 | Basement Flat, 10 Demo Road, Townton | AB1 2BC |
| ...more rows |
You can match it as follows:
import duckdb
from uk_address_matcher import AddressMatcher
con = duckdb.connect()
messy = con.read_csv("example_data/messy_example.csv")
canonical = con.read_csv("example_data/canonical_example.csv")
matcher = AddressMatcher(
canonical_addresses=canonical,
addresses_to_match=messy,
con=con,
)
result = matcher.match()
result.matches().show(max_width=10000)
Example output:
| unique_id | resolved_canonical_id | original_address_concat | original_address_concat_canonical | match_reason | match_weight | distinguishability |
|---|---|---|---|---|---|---|
| m_1 | c_2 | Flat A Example Court, 10 Demo Road, Townton | Flat A, 10 Demo Road, Townton | splink: probabilistic match | 13.5885 | 11.5033 |
Development
The scripts and tests will run better if you create .vscode/settings.json with the following:
{
"jupyter.notebookFileRoot": "${workspaceFolder}",
"python.analysis.extraPaths": [
"${workspaceFolder}"
],
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.pytestArgs": [
"-v",
"--capture=tee-sys"
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uk_address_matcher-1.1.0.tar.gz.
File metadata
- Download URL: uk_address_matcher-1.1.0.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1b308dddf68f7e98c6564d874fef180a2f7be97545cc21e0c53db25c6391044
|
|
| MD5 |
a7f65a1c4e0930f535e44e05fcc5a2d4
|
|
| BLAKE2b-256 |
465a425183f6177f955619f7eadc4c1c1d35bca8fb9e4d5b9fd2b3159dbb26a8
|
File details
Details for the file uk_address_matcher-1.1.0-py3-none-any.whl.
File metadata
- Download URL: uk_address_matcher-1.1.0-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73bce2f9e67c48f243fbb2dfd244bf8a0c40804220e389af3eef35238592ee43
|
|
| MD5 |
4b444ca7c469eeca49aa19a3b97d1149
|
|
| BLAKE2b-256 |
e9fdd086f8d6ad530649df63496c5f277f52641a0580d3a06363ec0babb62c3c
|