Skip to main content

python wrapper for assign api functions

Project description

assign-uprn

About

docs
https://joeldn.srht.site/assign-uprn

issue-tracker
https://todo.sr.ht/~joeldn/assign-uprn

mailing-list
https://lists.sr.ht/~joeldn/assign-uprn

source
https://git.sr.ht/~joeldn/assign-uprn

license
AGPLV3

Usage

Installation

Install from pypi

$ pip install assign-uprn

Background

Algorithmic address-to-UPRN matching

The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers) allocates a Unique Property Reference Number (UPRN) to records containing addresses.

Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.

ASSIGN compares addresses in freetext form with the Ordnance Survey’s “AddressBase Premium” UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matched 98.6% of patient addresses at 38,000 records per minute.

UPRN de-identification

ASSIGN can also de-identify UPRNs into Residential Anonymised Linkage Fields (RALFs).

RALFs are locations that are pseudo-anonymised by encrypting them using a salt which has itself been encrypted by a research governance function using the openpseudonymiser website:

https://www.openpseudonymiser.org

Different datasets containing UPRNs, such as datasets from across public services bring combined as part of a research project, can be de-identified using a shared project salt, and then linked anonymously for research purposes by anonymising the UPRN of each datapoint into a RALF.

[!NOTE]

A note on re-identification

De-identified data protects information about individuals within a safe environment, such as the “safe settings” element of the five safes framework:

https://ukdataservice.ac.uk/help/secure-lab/what-is-the-five-safes-framework/

Still, should the information leave that environment, it can then be re-identified through links with other datasets. This possibility is explored in the following excerpt by Cory Doctorow:

https://pluralistic.net/2024/03/08/the-fire-of-orodruin/

…it is surprisingly easy to “re-identify” individuals in anonymous data-sets. To take an obvious example: we know which two dates former PM Tony Blair was given a specific treatment for a cardiac emergency, because this happened while he was in office. We also know Blair’s date of birth. Check any trove of NHS data that records a person who matches those three facts and you’ve found Tony Blair – and all the private data contained alongside those public facts is now in the public domain, forever.

Not everyone has Tony Blair’s reidentification hooks, but everyone has data in some kind of database, and those databases are continually being breached, leaked or intentionally released. A breach from a taxi service like Addison-Lee or Uber, or from Transport for London, will reveal the journeys that immediately preceded each prescription at each clinic or hospital in an “anonymous” NHS dataset, which can then be cross-referenced to databases of home addresses and workplaces. In an eyeblink, millions of Britons’ records of receiving treatment for STIs or cancer can be connected with named individuals – again, forever.

Using de-identified UPRNs in datasets does not eliminate the possibility of person-level records being re-identified if the data is not kept within the protection of the “five safes”.

Source materials for ASSIGN

docs
https://wiki.endeavourhealth.org

source
https://github.com/endeavourhealth-discovery/ASSIGN

Pre-requirements

License to use AddressBase Premium

AddressBase Premium usage is typically used by public service providers under the terms of the Public Services Geospatial Mapping Agreement (PSGA). You can check whether you are licensed to use this datas with the following lookup:

https://www.ordnancesurvey.co.uk/customers/public-sector/psga-member-finder

API access and authentication

Endeavour health manage access, and provide the API endpoints, usernames, and passwords that support API usage:

https://endeavourhealth.org

Python packages used by this module

the following packages dependencies need to be available in the python environment used by this package

# pip install requests, used to interact with the API
import requests
# pip install python-dotenv
from dotenv import load_dotenv

Working with python-dotenv

Create a .env file in the project root containing the following variables used by the package:

# ASSIGN_ENDPOINT may be something like:
# https://server-root-address/uprnapi/api2
ASSIGN_ENDPOINT=api_url
ASSIGN_USER=your_username
ASSIGN_PASS=your_password

.env is explicitly excluded from version control by .gitignore and keeps your authentication credentials separate from the code, making it safer and easier to share your work.

De-identification salt (optional)

To obtain RALFs, your research governance function can support you to obtain a salt they have previously encrypted with the openpseudonymiser website using a salt phrase created for the research project being conducted:

https://www.openpseudonymiser.org

The salt is encrypted using a private key known only to the University of Nottingham (the maintainers of openpseudonymiser).

Using the API

Single address check

A single address can be sent for matching within a single HTTP request. A search for 10+Downing+St,Westminster,London,SW1A2AA would receive the following response:

{
   "Address_format":"good",
   "Postcode_quality":"good",
   "Matched":true,
   "BestMatch":{
      "UPRN":"100023336956",
      "Qualifier":"Property",
      "LogicalStatus":"1",
      "Classification":"RD04",
      "ClassTerm":"Terraced",
      "Algorithm":"10-match1",
      "ABPAddress":{
         "Number":"10",
         "Street":"Downing Street",
         "Town":"City Of Westminster",
         "Postcode":"SW1A 2AA"
      },
      "Match_pattern":{
         "Postcode":"equivalent",
         "Street":"equivalent",
         "Number":"equivalent",
         "Building":"equivalent",
         "Flat":"equivalent"
      }
   }
}

Uploading an encrypted salt

If you wish to de-identify the UPRNs, please ask your data governance function to provide you with a .EncryptedSalt file from the openpseudonymiser website. You can then use this packages upload function to send this to the API.

Subsequent addresses sent to the API with the upload function will not only be UPRN matched, but a RALF will be provided too (see the ralf column in Example download file content in this document).

Multiple address checking

Multiple addresses can be uploaded within a text file which is processed immediately after the file has been uploaded, and downloaded shortly afterwards.

Upload

The maximum number of address candidates that you can upload in a single file is 100,000.

The address file to be uploaded must:

  • have a .txt extension
  • include no headers
  • contain two columns separated by a single tab character
    • The first line must not contain any header information
    • The first column is a unique numeric row id
    • The second column is the address (with commas between each address line)

Example upload file content:

1⭾10 Downing St,Westminster,London,SW1A2AA
3⭾Bridge Street,London,SW1A 2LW
4⭾221b Baker St,Marylebone,London,NW1 6XE
5⭾3 Abbey Rd,St John's Wood,London,NW8 9AY

Download

Uploads are processed straightaway and can be downloaded by referencing the name of the upload file in the API call. The download includes data from AddressBase Premium (plus a RALF if you’ve previously uploaded a .EncryptedSalt file):

Example download file content:

id uprn address_fmt algorithm classification match_building match_flat match_number match_postcode match_street abp_number abp_postcode abp_street abp_town qualifier adr_candiddate abp_building latitude longitude point x y ralf classification_term abp_flat logical_status
1 100023336956 10-match1 RD04 equivalent equivalent equivalent equivalent equivalent 10 SW1A 2AA Downing Street City Of Westminster Property 10 Downing St,Westminster,London,SW1A2AA 51.5035410 -.1276700 51.5035410 530047.00 179951.00 C30921C8404087803C3687301351FF41CCB4A5E8F3691070723293C8BD654CBB Terraced 1
2 200002501505 550-match5a PP candidate field dropped equivalent equivalent equivalent equivalent SW1A 2LW Bridge Street City Of Westminster Property Bridge Street,London,SW1A 2LW Portcullis House 51.5013476 -.1243451 51.5013476 530284.00 179713.00 4D19E2EB66A2C12BD56B93D96CFBBE5B74525AEFC4C68329BE87B55C43EA4C36 Property Shell 1
3 100023071949 3200-match61A170 CR08 moved from Number equivalent moved to Building equivalent equivalent NW1 6XE Baker Street London Property 221b Baker St,Marylebone,London,NW1 6XE 221B 51.5237510 -.1585550 51.5237510 527847.00 182144.00 7727B90C7C3A744AF6FD8D5A4FEB6767B1EACBBC721B85EED6AE86EDD2B0BA9C Shop / Showroom 1
4 100023122909 40-match1 CR08 moved from Street moved from Number moved from Flat equivalent moved from Building 3 NW8 9AY Abbey Road City Of Westminster Property 3 Abbey Rd,St Johns Wood,London,NW8 9AY 51.5321562 -.1779541 51.5321562 526478.00 183045.00 6E479D3F8DA8A548C631622EA8640E1CE9030289C5ED4458B91A4F6C4F92C799 Shop / Showroom 1

Developer Guide

This project uses nbdev which uses notebooks to create the package the module, tests, documentation (using quarto), and makes git versioning cleaner by removing notebook metadata prior to commits:

https://nbdev.fast.ai/

Working on the assign-uprn package

nbdev provides a number of commands supporting development work on a version-controlled package, test, and documentation portfolio.

Some of the commands used during a typical development workflow are listed below:

# install assign_uprn as editable source code
$ pip install -e '[.dev]'

# make changes to notebooks in the nbs/ directory
# ...

# remove notebook metadata to make git history cleaner
$ nbdev_clean

# compile to have changes apply to assign_uprn module, and run tests
$ nbdev_prepare

# build the static website with quarto (https://quarto.org/)
$ nbdev_docs

# local preview of the website with quarto
$ nbdev_preview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assign_uprn-0.0.10.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assign_uprn-0.0.10-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file assign_uprn-0.0.10.tar.gz.

File metadata

  • Download URL: assign_uprn-0.0.10.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for assign_uprn-0.0.10.tar.gz
Algorithm Hash digest
SHA256 b5358c50dc8bf14982a2d15529a78ea038e101bc9a50ca6b473772af64a24b0c
MD5 75f3a93ffdf71a8ad5957a2f815004bd
BLAKE2b-256 cc2a87941790b912c69ded879e3dd6cf0b396c32acddc93112ea55b75f4c1409

See more details on using hashes here.

File details

Details for the file assign_uprn-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: assign_uprn-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for assign_uprn-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 add4d8152d5791e5c6e5ea05d4d657de62f6253cddf17626b45bd65b8833276d
MD5 b8a790bbe33f222d18fd2e08789b1679
BLAKE2b-256 8ba64bde3265ca078c67a8aec4076fce537aff755015e641ef00271a12ffc3dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page