Skip to main content

A package for matching UK addresses using a pretrained Splink model

Project description

UK Address Matcher Logo

pypi Documentation

High performance UK addresses matcher (geocoder)

Fast, simple address matching (geocoding) in Python.

For full documentation, see our main documentation site.

Why use this library

  • Simple. Setup in seconds, runs on a laptop. No separate infrastructure of services needed.
  • Fast. Match 100,000 addresses in ~30 seconds.
  • Proven accuracy. We use public, labelled datasets to measure and document accuracy.
  • Support for Ordnance Survey data. We provide a automated build pipeline for users wishing to match to Ordnance Survey data. Matching to any other canonical dataset is also supported.

The end-to-end process of matching 100,000 addresses to Ordnance Survey data, including all software downloads and data processing takes:

  • Less than a minute if you are matching to a small area such as a local council region.
  • If matching to the whole UK, there's a one-time preprocessing step that takes around 10 minutes. Subsequent matching of 100k records takes less than a minute.

Installation

pip install uk_address_matcher

What does it do?

Given the following data:

  • a "messy" dataset of addresses that you want to match
  • a "canonical" dataset of known addresses, often an Ordnance Survey dataset such as AddressBase or NGD.

this package will find the best matching canonical address for each messy address.

Example:

Your address files need, at minimum, two columns: unique_id and address_concat.

postcode is optional by recommended. If not provided an attempt is made to parse them out of address_concat

Given the following data:

Messy data

unique_id address_concat postcode
m_1 Flat A Example Court, 10 Demo Road, Townton AB1 2BC
...more rows

Canonical data

unique_id address_concat postcode
c_1 Flat A, 10 Demo Road, Townton AB1 2BC
c_2 Flat B, 10 Demo Road, Townton AB1 2BC
c_3 Basement Flat, 10 Demo Road, Townton AB1 2BC
...more rows

You can match it as follows:

import duckdb
from uk_address_matcher import AddressMatcher

con = duckdb.connect()
messy = con.read_csv("example_data/messy_example.csv")
canonical = con.read_csv("example_data/canonical_example.csv")

matcher = AddressMatcher(
    canonical_addresses=canonical,
    addresses_to_match=messy,
    con=con,
)
result = matcher.match()
result.matches().show(max_width=10000)

Example output:

unique_id resolved_canonical_id original_address_concat original_address_concat_canonical match_reason match_weight distinguishability
m_1 c_2 Flat A Example Court, 10 Demo Road, Townton Flat A, 10 Demo Road, Townton splink: probabilistic match 13.5885 11.5033

Development

The scripts and tests will run better if you create .vscode/settings.json with the following:

{
    "jupyter.notebookFileRoot": "${workspaceFolder}",
    "python.analysis.extraPaths": [
        "${workspaceFolder}"
    ],
    "python.testing.pytestEnabled": true,
    "python.testing.unittestEnabled": false,
    "python.testing.pytestArgs": [
        "-v",
        "--capture=tee-sys"
    ]
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_address_matcher-1.1.1.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_address_matcher-1.1.1-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file uk_address_matcher-1.1.1.tar.gz.

File metadata

  • Download URL: uk_address_matcher-1.1.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_address_matcher-1.1.1.tar.gz
Algorithm Hash digest
SHA256 35c94ac47e027ee2d74e52eb1611d0d6bfd73aac3969c34c164650c386f743f7
MD5 8a20dbcf405f706dfd5d0f3145570464
BLAKE2b-256 545650ea925b77b6a171c0100c83d5ec98d87c7962c4f1709b14554b9bf92c0b

See more details on using hashes here.

File details

Details for the file uk_address_matcher-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: uk_address_matcher-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_address_matcher-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a25bae62db0c8a250c8bc55323708b2179e5add6e2ff707b40bb6a4f7140d94
MD5 6711922e6c46b6816a031e2f6280bfd7
BLAKE2b-256 c433bdf8ea0fa592702534296eafc173a1691c022b40787f03461e2a612d74af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page