Skip to main content

Detect and resolve ONS geography boundary changes and vintage mismatches in UK data

Project description

geolintr

Stop silently joining datasets with mismatched ONS geography boundaries.

pip install geolintr

Requires Python 3.9+ · Zero dependencies · Built for UK government and NHS data work


The problem

Every UK government dataset uses ONS geography codes. But those codes change — sometimes quietly and sometimes dramatically:

  • CCGs became ICBs in July 2022 (106 areas became 42)
  • LSOA and MSOA boundaries were redesigned after Census 2021
  • Local authority districts merge and split regularly
  • Terminated postcodes still appear in older datasets

When you join two datasets with codes from different years, the join silently succeeds but the results are wrong. geolintr catches this before it causes problems.


Quick start

from geolintr import detect_vintages, format_report

# Check a list of codes from your dataset
codes = ["E38000240", "E38000244", "E54000028", "E54000029"]

warnings = detect_vintages(codes)
print(format_report(warnings))

Output:

============================================================
  geolintr report
============================================================
  2 error(s)  0 warning(s)  1 info(s)
============================================================

[ERROR] 2 CCG code(s) found (E38 prefix). CCGs were abolished on 1 July 2022 and replaced by ICBs (E54).
       Codes: E38000240, E38000244
       Tip:   Use geolintr.map_ccg_to_icb() to convert CCG codes to their ICB successors.

[ERROR] Both CCG (pre-2022) and ICB (post-2022) codes found. These represent different organisational boundaries.
       Codes: E38000240, E38000244, E54000028, E54000029
       Tip:   Standardise to ICB using geolintr.map_ccg_to_icb().

Detect geography type

from geolintr import detect_geo_type

info = detect_geo_type("E38000240")
info.geo_type     # "CCG"
info.description  # "CCG (England) - deprecated 2022"
info.warnings     # ["E38000240 appears to be a CCG code. CCGs were replaced by ICBs in July 2022."]

info = detect_geo_type("E54000028")
info.geo_type     # "ICB"
info.is_known     # True

Map CCG codes to ICB

from geolintr import map_ccg_to_icb

result = map_ccg_to_icb("E38000240")
result.found        # True
result.target_code  # "E54000029"
result.message      # "Mapped CCG E38000240 to ICB E54000029."

Map any codes using a custom lookup

from geolintr import map_codes

# e.g. after a LAD merger
old_to_new = {
    "E07000187": "E06000066",  # Kettering -> North Northamptonshire
    "E07000188": "E06000066",  # Corby -> North Northamptonshire
}

results = map_codes(my_lad_codes, old_to_new, on_missing="warn")
for r in results:
    if not r.found:
        print(f"No mapping for {r.source_code}")

Validate codes against a known set

from geolintr import validate_codes

valid_icbs = {"E54000028", "E54000029", "E54000030"}
result = validate_codes(my_codes, valid_icbs)

result["coverage"]      # 0.94 — 94% of codes matched
result["invalid"]       # list of unrecognised codes
result["invalid_count"] # how many failed

What changed and when

from geolintr import boundary_changes_between

changes = boundary_changes_between(2021, 2023)
for c in changes:
    print(f"{c['year']}: {c['description']}")

# 2021: LAD restructure - several mergers in England
# 2022: CCGs abolished, replaced by ICBs
# 2023: LSOA/MSOA boundaries updated for Census 2021
# 2023: Further LAD restructure

Supported geography types

Prefix Type Notes
E01/W01/N01 LSOA Redesigned after Census 2021
E02/W02 MSOA Redesigned after Census 2021
E06-E09/W06/S12/N09 LAD Changes annually
E38 CCG Abolished July 2022
E54 ICB Replaced CCGs July 2022
E40 NHS England Region Stable
E12 Region Stable
E92/W92/S92/N92 Country Stable

Why geolintr

If you work with UK open data, you have almost certainly joined datasets with mismatched boundaries without realising. The results look plausible, but the numbers are wrong. geolintr makes this class of error visible and fixable.

Built by someone who hits this issue everyday.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geolintr-1.0.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geolintr-1.0.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file geolintr-1.0.1.tar.gz.

File metadata

  • Download URL: geolintr-1.0.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for geolintr-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9053957ac0aa8d6c89728969bb306c590d3a112c8b61728abb2a894f6aada3b2
MD5 d7189dc717404deae73e7920346834f3
BLAKE2b-256 8ac92ba59cd33b080e3e909b230d231f2613a368b697b2b4704e585e1b141480

See more details on using hashes here.

File details

Details for the file geolintr-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: geolintr-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for geolintr-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2749ef5afbed3d900f7fb6878f205863fe20ad0ea71ca5917c2aac50936b6278
MD5 2e898bba409f6f2b17fcb8a864f0c814
BLAKE2b-256 a44e94ee4664a69d0f5c3d02fe4916a34ccaabc38e069c9b7afb387aba24ba8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page