Skip to main content

A package for matching UK addresses using a pretrained Splink model

Project description

UK Address Matcher Logo

pypi Documentation

High performance UK addresses matcher (geocoder)

Fast, simple address matching (geocoding) in Python.

For full documentation, see our main documentation site.

Why use this library

  • Simple. Setup in seconds, runs on a laptop. No separate infrastructure of services needed.
  • Fast. Match 100,000 addresses in ~30 seconds.
  • Proven accuracy. We use public, labelled datasets to measure and document accuracy.
  • Support for Ordnance Survey data. We provide a automated build pipeline for users wishing to match to Ordnance Survey data. Matching to any other canonical dataset is also supported.

The end-to-end process of matching 100,000 addresses to Ordnance Survey data, including all software downloads and data processing takes:

  • Less than a minute if you are matching to a small area such as a local council region.
  • If matching to the whole UK, there's a one-time preprocessing step that takes around 10 minutes. Subsequent matching of 100k records takes less than a minute.

Installation

pip install uk_address_matcher

What does it do?

Given the following data:

  • a "messy" dataset of addresses that you want to match
  • a "canonical" dataset of known addresses, often an Ordnance Survey dataset such as AddressBase or NGD.

this package will find the best matching canonical address for each messy address.

Example:

Your address files need, at minimum, two columns: unique_id and address_concat.

postcode is optional by recommended. If not provided an attempt is made to parse them out of address_concat

Given the following data:

Messy data

unique_id address_concat postcode
m_1 Flat A Example Court, 10 Demo Road, Townton AB1 2BC
...more rows

Canonical data

unique_id address_concat postcode
c_1 Flat A, 10 Demo Road, Townton AB1 2BC
c_2 Flat B, 10 Demo Road, Townton AB1 2BC
c_3 Basement Flat, 10 Demo Road, Townton AB1 2BC
...more rows

You can match it as follows:

import duckdb
from uk_address_matcher import AddressMatcher

con = duckdb.connect()
messy = con.read_csv("example_data/messy_example.csv")
canonical = con.read_csv("example_data/canonical_example.csv")

matcher = AddressMatcher(
    canonical_addresses=canonical,
    addresses_to_match=messy,
    con=con,
)
result = matcher.match()
result.matches().show(max_width=10000)

Example output:

unique_id resolved_canonical_id original_address_concat original_address_concat_canonical match_reason match_weight distinguishability
m_1 c_2 Flat A Example Court, 10 Demo Road, Townton Flat A, 10 Demo Road, Townton splink: probabilistic match 13.5885 11.5033

Development

The scripts and tests will run better if you create .vscode/settings.json with the following:

{
    "jupyter.notebookFileRoot": "${workspaceFolder}",
    "python.analysis.extraPaths": [
        "${workspaceFolder}"
    ],
    "python.testing.pytestEnabled": true,
    "python.testing.unittestEnabled": false,
    "python.testing.pytestArgs": [
        "-v",
        "--capture=tee-sys"
    ]
}

Project details


Release history Release notifications | RSS feed

This version

1.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_address_matcher-1.1.0.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_address_matcher-1.1.0-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file uk_address_matcher-1.1.0.tar.gz.

File metadata

  • Download URL: uk_address_matcher-1.1.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_address_matcher-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e1b308dddf68f7e98c6564d874fef180a2f7be97545cc21e0c53db25c6391044
MD5 a7f65a1c4e0930f535e44e05fcc5a2d4
BLAKE2b-256 465a425183f6177f955619f7eadc4c1c1d35bca8fb9e4d5b9fd2b3159dbb26a8

See more details on using hashes here.

File details

Details for the file uk_address_matcher-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: uk_address_matcher-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_address_matcher-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73bce2f9e67c48f243fbb2dfd244bf8a0c40804220e389af3eef35238592ee43
MD5 4b444ca7c469eeca49aa19a3b97d1149
BLAKE2b-256 e9fdd086f8d6ad530649df63496c5f277f52641a0580d3a06363ec0babb62c3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page