A package for matching UK addresses using a pretrained Splink model
Project description
Matching UK addresses using Splink
High performance address matching using a pre-trained Splink model.
Assuming you have two duckdb dataframes in this format:
unique_id | address_concat | postcode |
---|---|---|
1 | 123 Fake Street, Faketown | FA1 2KE |
2 | 456 Other Road, Otherville | NO1 3WY |
... | ... | ... |
Match them with:
from uk_address_matcher.cleaning_pipelines import (
clean_data_using_precomputed_rel_tok_freq,
)
from uk_address_matcher.splink_model import _performance_predict
df_1_c = clean_data_using_precomputed_rel_tok_freq(df_1, con=con)
df_2_c = clean_data_using_precomputed_rel_tok_freq(df_2, con=con)
linker, predictions = _performance_predict(
df_addresses_to_match=df_1_c,
df_addresses_to_search_within=df_2_c,
con=con,
match_weight_threshold=-10,
output_all_cols=True,
include_full_postcode_block=True,
)
Initial tests suggest you can match ~ 1,000 addresses per second against a list of 30 million addresses on a laptop.
Refer to the example, which has detailed comments, for how to match your data.
See an example of comparing two addresses to get a sense of what it does/how it scores
Run an interactive example in your browser:
Match 5,000 FHRS records to 21,952 companies house records in < 10 seconds.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file uk_address_matcher-0.0.1.dev11.tar.gz
.
File metadata
- Download URL: uk_address_matcher-0.0.1.dev11.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.8 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45b0ff6aba21696106344e175b93ff929c2a2fedc005cfc008be6ca7c69bf1ca |
|
MD5 | f761db2fd67637a1ce2046e1e2237bff |
|
BLAKE2b-256 | 7588c9c08c8f0d85fa715d0c87b32dcdeb858563ab80b6708113fede6d8b55df |
File details
Details for the file uk_address_matcher-0.0.1.dev11-py3-none-any.whl
.
File metadata
- Download URL: uk_address_matcher-0.0.1.dev11-py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.8 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeeeb99259987e7c820d21afdcf6f16a8fd9a2ee54c439e5c97e9e4a63c8501d |
|
MD5 | b82beb7ee42f270305f1d82def481931 |
|
BLAKE2b-256 | e46d45295aef038a96e83d1315b21991c297abbe7718233a53055f24179eb1b5 |