python client for ASSIGN api functions
Project description
assign-uprn
About
docs
https://joeldn.srht.site/assign-uprn
issue-tracker
https://todo.sr.ht/~joeldn/assign-uprn
source
https://git.sr.ht/~joeldn/assign-uprn
distribution
https://pypi.org/project/assign-uprn/
slides
https://joeldn.srht.site/uprns-a-very-british-love-affair
license
AGPLV3
Setup
Installation
> pip install assign-uprn
Background
About ASSIGN
Assign is made by Endeavour Health.
The charity funds projects that are designed to enable new, or improved, healthcare services, and provides unrestricted open source technology for those projects.
One of these projects is the ASSIGN API, which is extremely useful for public servants standardising, validating, and de-identifying address data.
home
https://endeavourhealth.org
docs
https://wiki.endeavourhealth.org
source
https://github.com/endeavourhealth-discovery/ASSIGN
Algorithmic address-to-UPRN matching with ASSIGN
The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers) allocates a Unique Property Reference Number (UPRN) to records containing addresses.
Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.
ASSIGN compares addresses in freetext form with the Ordnance Survey’s “AddressBase Premium” UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matched 98.6% of patient addresses at 38,000 records per minute.
UPRN de-identification
ASSIGN can also de-identify UPRNs into Residential Anonymised Linkage Fields (RALFs).
RALFs are locations that are pseudo-anonymised by encrypting them using a salt which has itself been encrypted using the OpenPseudonymiser website:
Salt encryption is performed by a research governance practice which ensures matching and linking of UPRN is conducted within the protection of the “Five Safes” framework:
https://ukdataservice.ac.uk/help/secure-lab/what-is-the-five-safes-framework/
Different datasets containing UPRNs, such as datasets from across public services, can be safely linked whilst protecting the identity of the data subjects, through the use of RALFs created using a shared salt.
[!NOTE]
On re-identification
De-identified data protects information about individuals within a safe environment, such as the “Safe Settings” element of the “Five Safes” framework.
Still, should the information leave the protection of the “Five Safes”, it can then be re-identified through links with other datasets. This possibility is explored in the following excerpt by Cory Doctorow:
…it is surprisingly easy to “re-identify” individuals in anonymous data-sets. To take an obvious example: we know which two dates former PM Tony Blair was given a specific treatment for a cardiac emergency, because this happened while he was in office. We also know Blair’s date of birth. Check any trove of NHS data that records a person who matches those three facts and you’ve found Tony Blair – and all the private data contained alongside those public facts is now in the public domain, forever.
Not everyone has Tony Blair’s reidentification hooks, but everyone has data in some kind of database, and those databases are continually being breached, leaked or intentionally released. A breach from a taxi service like Addison-Lee or Uber, or from Transport for London, will reveal the journeys that immediately preceded each prescription at each clinic or hospital in an “anonymous” NHS dataset, which can then be cross-referenced to databases of home addresses and workplaces. In an eyeblink, millions of Britons’ records of receiving treatment for STIs or cancer can be connected with named individuals – again, forever.
Using de-identified UPRNs in datasets does not eliminate the possibility of person-level records being re-identified if the data is not kept within the protection of the “five safes”.
Pre-requirements
License to use AddressBase Premium
AddressBase Premium usage is typically used by public service providers under the terms of the Public Services Geospatial Mapping Agreement (PSGA). You can check whether you are licensed to use this datas with the following lookup:
https://www.ordnancesurvey.co.uk/customers/public-sector/psga-member-finder
API access and authentication
Endeavour health manage access, and provide the API endpoints, usernames, and passwords that support API usage:
Working with environment variables and python-dotenv
You can set the environment variables for API access and authorisation
directly within the environment itself, such as in a pipeline setting,
or with a .env file. The default location tried is the present working
directory, but the location can be passed to the package too. Here are
the variables:
# ASSIGN_ENDPOINT may be something like:
# https://server-root-address/uprnapi/api2
ASSIGN_ENDPOINT=api_url
ASSIGN_USER=your_username
ASSIGN_PASS=your_password
.env is explicitly excluded from version control by .gitignore and
keeps your authentication credentials separate from the code, making it
safer and easier to share your work.
De-identification salt (optional)
To obtain RALFs, your information governance service can support you to obtain a salt they have previously encrypted with the OpenPseudonymiser website using a salt phrase created for the work being conducted under the “Safe Projects” principle of the “Five Safes”.
The salt is encrypted using a private key known only to the University of Nottingham (the maintainers of OpenPseudonymiser).
Using the API
Single address validation
A single address can be sent for matching within a single HTTP request.
A search for 10+Downing+St,Westminster,London,SW1A2AA would receive
the following response:
{
"Address_format":"good",
"Postcode_quality":"good",
"Matched":true,
"BestMatch":{
"UPRN":"100023336956",
"Qualifier":"Property",
"LogicalStatus":"1",
"Classification":"RD04",
"ClassTerm":"Terraced",
"Algorithm":"10-match1",
"ABPAddress":{
"Number":"10",
"Street":"Downing Street",
"Town":"City Of Westminster",
"Postcode":"SW1A 2AA"
},
"Match_pattern":{
"Postcode":"equivalent",
"Street":"equivalent",
"Number":"equivalent",
"Building":"equivalent",
"Flat":"equivalent"
}
}
}
Multiple address validation
Multiple addresses can be uploaded within a text file which is processed immediately after the file has been uploaded, and downloaded shortly afterwards.
Upload
The maximum number of address candidates that you can upload in a single
file is 100,000.
The address file to be uploaded must:
- have a .txt extension
- include no headers
- contain two columns separated by a single tab character
- The first line must not contain any header information, only data
- The first column is a unique numeric row id
- The second column is the address (with commas between each address line)
Example upload file content:
1 ⭾ 10 Downing St,Westminster,London,SW1A2AA
3 ⭾ Bridge Street,London,SW1A 2LW
4 ⭾ 221b Baker St,Marylebone,London,NW1 6XE
5 ⭾ 3 Abbey Rd,St John's Wood,London,NW8 9AY
Download
Uploads are processed straightaway and can be downloaded by referencing
the name of the upload file in the API call. The download includes data
from AddressBase Premium (plus a RALF if you’ve previously uploaded a
.EncryptedSalt file):
Example download file content:
| id | uprn | address_fmt | algorithm | classification | match_building | match_flat | match_number | match_postcode | match_street | abp_number | abp_postcode | abp_street | abp_town | qualifier | adr_candiddate | abp_building | latitude | longitude | point | x | y | ralf | classification_term | abp_flat | logical_status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 100023336956 | 10-match1 | RD04 | equivalent | equivalent | equivalent | equivalent | equivalent | 10 | SW1A 2AA | Downing Street | City Of Westminster | Property | 10 Downing St,Westminster,London,SW1A2AA | 51.5035410 | -.1276700 | 51.5035410 | 530047.00 | 179951.00 | C30921C8404087803C3687301351FF41CCB4A5E8F3691070723293C8BD654CBB | Terraced | 1 | |||
| 2 | 200002501505 | 550-match5a | PP | candidate field dropped | equivalent | equivalent | equivalent | equivalent | SW1A 2LW | Bridge Street | City Of Westminster | Property | Bridge Street,London,SW1A 2LW | Portcullis House | 51.5013476 | -.1243451 | 51.5013476 | 530284.00 | 179713.00 | 4D19E2EB66A2C12BD56B93D96CFBBE5B74525AEFC4C68329BE87B55C43EA4C36 | Property Shell | 1 | |||
| 3 | 100023071949 | 3200-match61A170 | CR08 | moved from Number | equivalent | moved to Building | equivalent | equivalent | NW1 6XE | Baker Street | London | Property | 221b Baker St,Marylebone,London,NW1 6XE | 221B | 51.5237510 | -.1585550 | 51.5237510 | 527847.00 | 182144.00 | 7727B90C7C3A744AF6FD8D5A4FEB6767B1EACBBC721B85EED6AE86EDD2B0BA9C | Shop / Showroom | 1 | |||
| 4 | 100023122909 | 40-match1 | CR08 | moved from Street | moved from Number | moved from Flat | equivalent | moved from Building | 3 | NW8 9AY | Abbey Road | City Of Westminster | Property | 3 Abbey Rd,St Johns Wood,London,NW8 9AY | 51.5321562 | -.1779541 | 51.5321562 | 526478.00 | 183045.00 | 6E479D3F8DA8A548C631622EA8640E1CE9030289C5ED4458B91A4F6C4F92C799 | Shop / Showroom | 1 |
Uploading an encrypted salt (optional)
If you wish to de-identify the UPRNs, please ensure you have a
.EncryptedSalt file from the OpenPseudonymiser website, and under the
guidance of the Five Safes.
You can then use this package’s upload function to send this to the
API.
Subsequent addresses sent to the API with the upload function will not
only be UPRN matched, but a RALF will be provided too (see the ralf
column in Example download file
content in this document).
Developer Guide
This project is made with nbdev which uses notebooks to create and
publish the package, tests, and documentation (using quarto). It also
makes git versioning cleaner by removing notebook metadata prior to
commits:
Working on the assign-uprn package
nbdev provides a number of commands supporting development work.
Some of the commands used during a typical development workflow are listed below:
# install assign_uprn as editable source code
> pip install -e '[.dev]'
# make changes to notebooks in the nbs/ directory
# ...
# runs tests to ensure functions work as expected
> nbdev_test
# remove notebook metadata to make git history cleaner
> nbdev_clean
# compile to have changes apply to assign_uprn module (also runs tests)
> nbdev_prepare
# build the static website with quarto (https://quarto.org/)
> nbdev_docs
# local preview of the website with quarto
> nbdev_preview
# increment version for pypi
> nbdev_bump_version
# publish package on pypi
> nbdev_pypi
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file assign_uprn-0.2.0.tar.gz.
File metadata
- Download URL: assign_uprn-0.2.0.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b5efccd32ae9ffbe6c015682c363eca7e687c7da559959fb553f244b24ee5f1
|
|
| MD5 |
a38ba7490501db0785014ce1365952f6
|
|
| BLAKE2b-256 |
1e55c0370fd535e08bfba040df20312125ed7bd9f1c77165aae04d9c88fe452a
|
File details
Details for the file assign_uprn-0.2.0-py3-none-any.whl.
File metadata
- Download URL: assign_uprn-0.2.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98ac0b2a77038c8d387aa709ce4cc62fdc7f07930a51b2177cb7af3158925d63
|
|
| MD5 |
a9fb0b7e1ef511d68359620166108023
|
|
| BLAKE2b-256 |
4ab94972c93fd527b5978c5d23a6bc8b548e187d8b99f6b139fd0478c6b7ddcc
|