Skip to main content

python wrapper for assign api functions

Project description

assign-uprn

About

docs
https://joeldn.srht.site/assign-uprn

code
https://git.sr.ht/~joeldn/assign-uprn

Usage

Installation

Install from pypi

$ pip install assign_uprn

Background

In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.

The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), allocates a Unique Property Reference Number (UPRN) to patient records

Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.

ASSIGN compares addresses in freetext form with the Ordnance Survey’s “Address Base Premium” UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.

The address matching algorithms use a human mediated best fit method to match a candidate address to one address from the set of all available ‘standard’ addresses.

The algorithms use human semantic pattern recognition, applying rankings of matching judgements following rules that manipulate the text, supported by a few machine based algorithms such as the Levenshtein distance algorithm.

The rankings, which can be considered as a set of numbers, 1-n, could be described as a plausibility measure, as opposed to a probability measure or deterministic measure.

docs | code

How to use

API Access

Licence

You’ll need to be licensed to use AddressBase Premium, either commercially, or as non-commercial work covered by the Public Services Geospatial Mapping Agreement (PSGA). You can check whether your work is included with Ordnance Survey:

https://www.ordnancesurvey.co.uk/customers/public-sector/psga-member-finder

Access and Authentication

Endeavour health manage access, and provide usernames and passwords for authenticating API calls.

https://endeavourhealth.org

Python packages used by this module

the following packages dependencies need to be available in the python environment used by this package

# pip install requests, used to interact with the API
import requests
# pip install python-dotenv, note that other dot env packages exist
from dotenv import load_dotenv

Working with python-dotenv

You will need to create a .env file in the project root containing your authentication ASSIGN_ENDPOINT, ASSIGN_USER. and ASSIGN_PASS values. This file is explicitly ignored by .gitignore to keep your authentication credentials separate from the codebase.

The contents of .env will contain authentication credentials provided by endeavour health with the contents resembling the following structure:

ASSIGN_ENDPOINT=endpoint
ASSIGN_USER=username
ASSIGN_PASS=password

Single address check

A single address can be sent for matching within a single HTTP request. A search for 10+Downing+St,Westminster,London,SW1A2AA would receive the following response:

{
   "Address_format":"good",
   "Postcode_quality":"good",
   "Matched":true,
   "BestMatch":{
      "UPRN":"100023336956",
      "Qualifier":"Property",
      "LogicalStatus":"1",
      "Classification":"RD04",
      "ClassTerm":"Terraced",
      "Algorithm":"10-match1",
      "ABPAddress":{
         "Number":"10",
         "Street":"Downing Street",
         "Town":"City Of Westminster",
         "Postcode":"SW1A 2AA"
      },
      "Match_pattern":{
         "Postcode":"equivalent",
         "Street":"equivalent",
         "Number":"equivalent",
         "Building":"equivalent",
         "Flat":"equivalent"
      }
   }
}

Uploading an encrypted salt

ASSIGN can de-identify UPRNs into Residential Anonymised Linkage Fields (RALFs) - these are pseudo anonymised locations that are encrypted using an encrypted salt to pseudonymise records in a replicable way, so different datasets can be joined without identfiying individuals, analysis compatible with data protection.

To obtain RALFs, research governance for your work can provide you with an encrypted salt from the maintainers of the openpseudonymiser software:

https://www.openpseudonymiser.org

The salt is encrypted using a private key known only to The University of Nottingham (the maintainers of openpseudonymiser).

From then on, addresses uploaded within a file will not only be UPRN matched but a RALF provided alongside (see the Example download file content in this document).

Multiple address checking

Multiple addresses can be uploaded within a text file which is processed immediately after the file has been uploaded, and downloaded shortly afterwards.

Upload

The maximum number of address candidates that you can upload in a single file is 100,000.

The address file to be uploaded must:

  • have a .txt extension
  • include no headers
  • contain two columns separated by a single tab character
    • The first line must not contain any header information
    • The first column is a unique numeric row id
    • The second column is the address (with commas between each address line)

Example upload file content:

1⭾10 Downing St,Westminster,London,SW1A2AA
3⭾Bridge Street,London,SW1A 2LW
4⭾221b Baker St,Marylebone,London,NW1 6XE
5⭾3 Abbey Rd,St John's Wood,London,NW8 9AY

Download

Uploads are processed straightaway and can be downloaded by referencing the name of the upload file in the API call. The download includes data from AddressBase Premium, and

Example download file content:

id uprn address_fmt algorithm classification match_building match_flat match_number match_postcode match_street abp_number abp_postcode abp_street abp_town qualifier adr_candiddate abp_building latitude longitude point x y ralf classification_term abp_flat logical_status
1 100023336956 10-match1 RD04 equivalent equivalent equivalent equivalent equivalent 10 SW1A 2AA Downing Street City Of Westminster Property 10 Downing St,Westminster,London,SW1A2AA 51.5035410 -.1276700 51.5035410 530047.00 179951.00 C30921C8404087803C3687301351FF41CCB4A5E8F3691070723293C8BD654CBB Terraced 1
2 200002501505 550-match5a PP candidate field dropped equivalent equivalent equivalent equivalent SW1A 2LW Bridge Street City Of Westminster Property Bridge Street,London,SW1A 2LW Portcullis House 51.5013476 -.1243451 51.5013476 530284.00 179713.00 4D19E2EB66A2C12BD56B93D96CFBBE5B74525AEFC4C68329BE87B55C43EA4C36 Property Shell 1
3 100023071949 3200-match61A170 CR08 moved from Number equivalent moved to Building equivalent equivalent NW1 6XE Baker Street London Property 221b Baker St,Marylebone,London,NW1 6XE 221B 51.5237510 -.1585550 51.5237510 527847.00 182144.00 7727B90C7C3A744AF6FD8D5A4FEB6767B1EACBBC721B85EED6AE86EDD2B0BA9C Shop / Showroom 1
4 100023122909 40-match1 CR08 moved from Street moved from Number moved from Flat equivalent moved from Building 3 NW8 9AY Abbey Road City Of Westminster Property 3 Abbey Rd,St Johns Wood,London,NW8 9AY 51.5321562 -.1779541 51.5321562 526478.00 183045.00 6E479D3F8DA8A548C631622EA8640E1CE9030289C5ED4458B91A4F6C4F92C799 Shop / Showroom 1

Developer Guide

This project uses nbdev which uses notebooks to create the package the module, tests, documentation (using quarto), and makes git versioning cleaner by removing notebook metadata prior to commits:

https://nbdev.fast.ai/

Working on assign_uprn in development mode

# install assign_uprn package as a developer
$ pip install -e '[.dev]'

# make changes to notebooks in the nbs/ directory
# ...

# clean the notebook metadata to make git history cleaner
$ nbdev_clean

# compile to have changes apply to assign_uprn module, and run tests
$ nbdev_prepare

# build the static website with quarto (https://quarto.org/)
$ nbdev_docs

# local preview of the website with quarto
$ nbdev_preview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assign_uprn-0.0.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assign_uprn-0.0.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file assign_uprn-0.0.1.tar.gz.

File metadata

  • Download URL: assign_uprn-0.0.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for assign_uprn-0.0.1.tar.gz
Algorithm Hash digest
SHA256 280b730e36f091eabc01b7f0e52a6a33379f0fa2e42edd635a8d4dae88bea3cb
MD5 80589480792ccbe24db6f8d241354f71
BLAKE2b-256 d0ba6bf960acc5db977779d66450c3246d1887aaab67d885e4aed4ebaa7268ad

See more details on using hashes here.

File details

Details for the file assign_uprn-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: assign_uprn-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for assign_uprn-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 462f0c72b31cec705b504262b1dfaba4a98b76c0563024b096c8a5726b6496f7
MD5 a1b0a1571c90f467d9659c73e3e6110c
BLAKE2b-256 a69b70696a8525b309217c2b52b36d5409af025e40b6827baaf3e7f3bed4f6ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page