Skip to main content

Detect and resolve ONS geography boundary changes and vintage mismatches in UK data

Project description

geolintr

Stop silently joining datasets with mismatched ONS geography boundaries.

pip install geolintr

Requires Python 3.9+ · Zero dependencies · Built for UK government and NHS data work


The problem

Every UK government dataset uses ONS geography codes. But those codes change — sometimes quietly and sometimes dramatically:

  • CCGs became ICBs in July 2022 (106 areas became 42)
  • LSOA and MSOA boundaries were redesigned after Census 2021
  • Local authority districts merge and split regularly
  • Terminated postcodes still appear in older datasets

When you join two datasets with codes from different years, the join silently succeeds but the results are wrong. geolintr catches this before it causes problems.


Quick start

from geolintr import detect_vintages, format_report

# Check a list of codes from your dataset
codes = ["E38000240", "E38000244", "E54000028", "E54000029"]

warnings = detect_vintages(codes)
print(format_report(warnings))

Output:

============================================================
  geolintr report
============================================================
  2 error(s)  0 warning(s)  1 info(s)
============================================================

[ERROR] 2 CCG code(s) found (E38 prefix). CCGs were abolished on 1 July 2022 and replaced by ICBs (E54).
       Codes: E38000240, E38000244
       Tip:   Use geolintr.map_ccg_to_icb() to convert CCG codes to their ICB successors.

[ERROR] Both CCG (pre-2022) and ICB (post-2022) codes found. These represent different organisational boundaries.
       Codes: E38000240, E38000244, E54000028, E54000029
       Tip:   Standardise to ICB using geolintr.map_ccg_to_icb().

Detect geography type

from geolintr import detect_geo_type

info = detect_geo_type("E38000240")
info.geo_type     # "CCG"
info.description  # "CCG (England) - deprecated 2022"
info.warnings     # ["E38000240 appears to be a CCG code. CCGs were replaced by ICBs in July 2022."]

info = detect_geo_type("E54000028")
info.geo_type     # "ICB"
info.is_known     # True

Map CCG codes to ICB

from geolintr import map_ccg_to_icb

result = map_ccg_to_icb("E38000240")
result.found        # True
result.target_code  # "E54000029"
result.message      # "Mapped CCG E38000240 to ICB E54000029."

Map any codes using a custom lookup

from geolintr import map_codes

# e.g. after a LAD merger
old_to_new = {
    "E07000187": "E06000066",  # Kettering -> North Northamptonshire
    "E07000188": "E06000066",  # Corby -> North Northamptonshire
}

results = map_codes(my_lad_codes, old_to_new, on_missing="warn")
for r in results:
    if not r.found:
        print(f"No mapping for {r.source_code}")

Validate codes against a known set

from geolintr import validate_codes

valid_icbs = {"E54000028", "E54000029", "E54000030"}
result = validate_codes(my_codes, valid_icbs)

result["coverage"]      # 0.94 — 94% of codes matched
result["invalid"]       # list of unrecognised codes
result["invalid_count"] # how many failed

What changed and when

from geolintr import boundary_changes_between

changes = boundary_changes_between(2021, 2023)
for c in changes:
    print(f"{c['year']}: {c['description']}")

# 2021: LAD restructure - several mergers in England
# 2022: CCGs abolished, replaced by ICBs
# 2023: LSOA/MSOA boundaries updated for Census 2021
# 2023: Further LAD restructure

Supported geography types

Prefix Type Notes
E01/W01/N01 LSOA Redesigned after Census 2021
E02/W02 MSOA Redesigned after Census 2021
E06-E09/W06/S12/N09 LAD Changes annually
E38 CCG Abolished July 2022
E54 ICB Replaced CCGs July 2022
E40 NHS England Region Stable
E12 Region Stable
E92/W92/S92/N92 Country Stable

Why geolintr

If you work with UK open data, you have almost certainly joined datasets with mismatched boundaries without realising. The results look plausible, but the numbers are wrong. geolintr makes this class of error visible and fixable.

Built by someone who does this every day at DHSC.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geolintr-1.0.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geolintr-1.0.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file geolintr-1.0.0.tar.gz.

File metadata

  • Download URL: geolintr-1.0.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for geolintr-1.0.0.tar.gz
Algorithm Hash digest
SHA256 21a96045f4b531b770e3c425c73c3c254fe651418ba623550f51b527bade34fe
MD5 b86af0fe26c6796fafc1984fa8782ce2
BLAKE2b-256 2b64da5fbc007595ac38973ef118182784ad5728045112d365d5e88f26885310

See more details on using hashes here.

File details

Details for the file geolintr-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: geolintr-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for geolintr-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 465d74213c3dc09a11895f40e13fce5af1cc8d293796c28d2bec229252875525
MD5 b5e4742953c0fb7c043dca61fc5810c8
BLAKE2b-256 99f12d4e86ef2837e6a620e76ce94792c7c950d0a22cde3d08afcfdc8118cdf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page