python wrapper for assign api functions
Project description
assign-uprn
About
docs
https://joeldn.srht.site/assign-uprn
code
https://git.sr.ht/~joeldn/assign-uprn
Usage
Installation
Install from pypi
$ pip install assign_uprn
Background
In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.
The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), allocates a Unique Property Reference Number (UPRN) to patient records
Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.
ASSIGN compares addresses in freetext form with the Ordnance Survey’s “Address Base Premium” UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.
The address matching algorithms use a human mediated best fit method to match a candidate address to one address from the set of all available ‘standard’ addresses.
The algorithms use human semantic pattern recognition, applying rankings of matching judgements following rules that manipulate the text, supported by a few machine based algorithms such as the Levenshtein distance algorithm.
The rankings, which can be considered as a set of numbers,
1-n, could be described as a plausibility measure, as opposed to a probability measure or deterministic measure.
How to use
API Access
Licence
You’ll need to be licensed to use AddressBase Premium, either commercially, or as non-commercial work covered by the Public Services Geospatial Mapping Agreement (PSGA). You can check whether your work is included with Ordnance Survey:
https://www.ordnancesurvey.co.uk/customers/public-sector/psga-member-finder
Access and Authentication
Endeavour health manage access, and provide usernames and passwords for authenticating API calls.
Python packages used by this module
the following packages dependencies need to be available in the python environment used by this package
# pip install requests, used to interact with the API
import requests
# pip install python-dotenv, note that other dot env packages exist
from dotenv import load_dotenv
Working with python-dotenv
You will need to create a .env file in the project root containing
your authentication ASSIGN_ENDPOINT, ASSIGN_USER. and ASSIGN_PASS
values. This file is explicitly ignored by .gitignore to keep your
authentication credentials separate from the codebase.
The contents of .env will contain authentication credentials provided
by endeavour health with the contents resembling the following
structure:
ASSIGN_ENDPOINT=endpoint
ASSIGN_USER=username
ASSIGN_PASS=password
Single address check
A single address can be sent for matching within a single HTTP request.
A search for 10+Downing+St,Westminster,London,SW1A2AA would receive
the following response:
{
"Address_format":"good",
"Postcode_quality":"good",
"Matched":true,
"BestMatch":{
"UPRN":"100023336956",
"Qualifier":"Property",
"LogicalStatus":"1",
"Classification":"RD04",
"ClassTerm":"Terraced",
"Algorithm":"10-match1",
"ABPAddress":{
"Number":"10",
"Street":"Downing Street",
"Town":"City Of Westminster",
"Postcode":"SW1A 2AA"
},
"Match_pattern":{
"Postcode":"equivalent",
"Street":"equivalent",
"Number":"equivalent",
"Building":"equivalent",
"Flat":"equivalent"
}
}
}
Uploading an encrypted salt
ASSIGN can de-identify UPRNs into Residential Anonymised Linkage Fields (RALFs) - these are pseudo anonymised locations that are encrypted using an encrypted salt to pseudonymise records in a replicable way, so different datasets can be joined without identfiying individuals, analysis compatible with data protection.
To obtain RALFs, research governance for your work can provide you with an encrypted salt from the maintainers of the openpseudonymiser software:
The salt is encrypted using a private key known only to The University of Nottingham (the maintainers of openpseudonymiser).
From then on, addresses uploaded within a file will not only be UPRN
matched but a RALF provided alongside (see the
Example download file content in this document).
Multiple address checking
Multiple addresses can be uploaded within a text file which is processed immediately after the file has been uploaded, and downloaded shortly afterwards.
Upload
The maximum number of address candidates that you can upload in a single
file is 100,000.
The address file to be uploaded must:
- have a .txt extension
- include no headers
- contain two columns separated by a single tab character
- The first line must not contain any header information
- The first column is a unique numeric row id
- The second column is the address (with commas between each address line)
Example upload file content:
1⭾10 Downing St,Westminster,London,SW1A2AA
3⭾Bridge Street,London,SW1A 2LW
4⭾221b Baker St,Marylebone,London,NW1 6XE
5⭾3 Abbey Rd,St John's Wood,London,NW8 9AY
Download
Uploads are processed straightaway and can be downloaded by referencing the name of the upload file in the API call. The download includes data from AddressBase Premium, and
Example download file content:
| id | uprn | address_fmt | algorithm | classification | match_building | match_flat | match_number | match_postcode | match_street | abp_number | abp_postcode | abp_street | abp_town | qualifier | adr_candiddate | abp_building | latitude | longitude | point | x | y | ralf | classification_term | abp_flat | logical_status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 100023336956 | 10-match1 | RD04 | equivalent | equivalent | equivalent | equivalent | equivalent | 10 | SW1A 2AA | Downing Street | City Of Westminster | Property | 10 Downing St,Westminster,London,SW1A2AA | 51.5035410 | -.1276700 | 51.5035410 | 530047.00 | 179951.00 | C30921C8404087803C3687301351FF41CCB4A5E8F3691070723293C8BD654CBB | Terraced | 1 | |||
| 2 | 200002501505 | 550-match5a | PP | candidate field dropped | equivalent | equivalent | equivalent | equivalent | SW1A 2LW | Bridge Street | City Of Westminster | Property | Bridge Street,London,SW1A 2LW | Portcullis House | 51.5013476 | -.1243451 | 51.5013476 | 530284.00 | 179713.00 | 4D19E2EB66A2C12BD56B93D96CFBBE5B74525AEFC4C68329BE87B55C43EA4C36 | Property Shell | 1 | |||
| 3 | 100023071949 | 3200-match61A170 | CR08 | moved from Number | equivalent | moved to Building | equivalent | equivalent | NW1 6XE | Baker Street | London | Property | 221b Baker St,Marylebone,London,NW1 6XE | 221B | 51.5237510 | -.1585550 | 51.5237510 | 527847.00 | 182144.00 | 7727B90C7C3A744AF6FD8D5A4FEB6767B1EACBBC721B85EED6AE86EDD2B0BA9C | Shop / Showroom | 1 | |||
| 4 | 100023122909 | 40-match1 | CR08 | moved from Street | moved from Number | moved from Flat | equivalent | moved from Building | 3 | NW8 9AY | Abbey Road | City Of Westminster | Property | 3 Abbey Rd,St Johns Wood,London,NW8 9AY | 51.5321562 | -.1779541 | 51.5321562 | 526478.00 | 183045.00 | 6E479D3F8DA8A548C631622EA8640E1CE9030289C5ED4458B91A4F6C4F92C799 | Shop / Showroom | 1 |
Developer Guide
This project uses nbdev which uses notebooks to create the package the
module, tests, documentation (using quarto), and makes git versioning
cleaner by removing notebook metadata prior to commits:
Working on assign_uprn in development mode
# install assign_uprn package as a developer
$ pip install -e '[.dev]'
# make changes to notebooks in the nbs/ directory
# ...
# clean the notebook metadata to make git history cleaner
$ nbdev_clean
# compile to have changes apply to assign_uprn module, and run tests
$ nbdev_prepare
# build the static website with quarto (https://quarto.org/)
$ nbdev_docs
# local preview of the website with quarto
$ nbdev_preview
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file assign_uprn-0.0.1.tar.gz.
File metadata
- Download URL: assign_uprn-0.0.1.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
280b730e36f091eabc01b7f0e52a6a33379f0fa2e42edd635a8d4dae88bea3cb
|
|
| MD5 |
80589480792ccbe24db6f8d241354f71
|
|
| BLAKE2b-256 |
d0ba6bf960acc5db977779d66450c3246d1887aaab67d885e4aed4ebaa7268ad
|
File details
Details for the file assign_uprn-0.0.1-py3-none-any.whl.
File metadata
- Download URL: assign_uprn-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
462f0c72b31cec705b504262b1dfaba4a98b76c0563024b096c8a5726b6496f7
|
|
| MD5 |
a1b0a1571c90f467d9659c73e3e6110c
|
|
| BLAKE2b-256 |
a69b70696a8525b309217c2b52b36d5409af025e40b6827baaf3e7f3bed4f6ae
|