Detect and resolve ONS geography boundary changes and vintage mismatches in UK data
Project description
geolintr
Stop silently joining datasets with mismatched ONS geography boundaries.
pip install geolintr
Requires Python 3.9+ · Zero dependencies · Built for UK government and NHS data work
The problem
Every UK government dataset uses ONS geography codes. But those codes change — sometimes quietly and sometimes dramatically:
- CCGs became ICBs in July 2022 (106 areas became 42)
- LSOA and MSOA boundaries were redesigned after Census 2021
- Local authority districts merge and split regularly
- Terminated postcodes still appear in older datasets
When you join two datasets with codes from different years, the join silently succeeds but the results are wrong. geolintr catches this before it causes problems.
Quick start
from geolintr import detect_vintages, format_report
# Check a list of codes from your dataset
codes = ["E38000240", "E38000244", "E54000028", "E54000029"]
warnings = detect_vintages(codes)
print(format_report(warnings))
Output:
============================================================
geolintr report
============================================================
2 error(s) 0 warning(s) 1 info(s)
============================================================
[ERROR] 2 CCG code(s) found (E38 prefix). CCGs were abolished on 1 July 2022 and replaced by ICBs (E54).
Codes: E38000240, E38000244
Tip: Use geolintr.map_ccg_to_icb() to convert CCG codes to their ICB successors.
[ERROR] Both CCG (pre-2022) and ICB (post-2022) codes found. These represent different organisational boundaries.
Codes: E38000240, E38000244, E54000028, E54000029
Tip: Standardise to ICB using geolintr.map_ccg_to_icb().
Detect geography type
from geolintr import detect_geo_type
info = detect_geo_type("E38000240")
info.geo_type # "CCG"
info.description # "CCG (England) - deprecated 2022"
info.warnings # ["E38000240 appears to be a CCG code. CCGs were replaced by ICBs in July 2022."]
info = detect_geo_type("E54000028")
info.geo_type # "ICB"
info.is_known # True
Map CCG codes to ICB
from geolintr import map_ccg_to_icb
result = map_ccg_to_icb("E38000240")
result.found # True
result.target_code # "E54000029"
result.message # "Mapped CCG E38000240 to ICB E54000029."
Map any codes using a custom lookup
from geolintr import map_codes
# e.g. after a LAD merger
old_to_new = {
"E07000187": "E06000066", # Kettering -> North Northamptonshire
"E07000188": "E06000066", # Corby -> North Northamptonshire
}
results = map_codes(my_lad_codes, old_to_new, on_missing="warn")
for r in results:
if not r.found:
print(f"No mapping for {r.source_code}")
Validate codes against a known set
from geolintr import validate_codes
valid_icbs = {"E54000028", "E54000029", "E54000030"}
result = validate_codes(my_codes, valid_icbs)
result["coverage"] # 0.94 — 94% of codes matched
result["invalid"] # list of unrecognised codes
result["invalid_count"] # how many failed
What changed and when
from geolintr import boundary_changes_between
changes = boundary_changes_between(2021, 2023)
for c in changes:
print(f"{c['year']}: {c['description']}")
# 2021: LAD restructure - several mergers in England
# 2022: CCGs abolished, replaced by ICBs
# 2023: LSOA/MSOA boundaries updated for Census 2021
# 2023: Further LAD restructure
Supported geography types
| Prefix | Type | Notes |
|---|---|---|
| E01/W01/N01 | LSOA | Redesigned after Census 2021 |
| E02/W02 | MSOA | Redesigned after Census 2021 |
| E06-E09/W06/S12/N09 | LAD | Changes annually |
| E38 | CCG | Abolished July 2022 |
| E54 | ICB | Replaced CCGs July 2022 |
| E40 | NHS England Region | Stable |
| E12 | Region | Stable |
| E92/W92/S92/N92 | Country | Stable |
Why geolintr
If you work with UK open data, you have almost certainly joined datasets with mismatched boundaries without realising. The results look plausible, but the numbers are wrong. geolintr makes this class of error visible and fixable.
Built by someone who does this every day at DHSC.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geolintr-1.0.0.tar.gz.
File metadata
- Download URL: geolintr-1.0.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21a96045f4b531b770e3c425c73c3c254fe651418ba623550f51b527bade34fe
|
|
| MD5 |
b86af0fe26c6796fafc1984fa8782ce2
|
|
| BLAKE2b-256 |
2b64da5fbc007595ac38973ef118182784ad5728045112d365d5e88f26885310
|
File details
Details for the file geolintr-1.0.0-py3-none-any.whl.
File metadata
- Download URL: geolintr-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
465d74213c3dc09a11895f40e13fce5af1cc8d293796c28d2bec229252875525
|
|
| MD5 |
b5e4742953c0fb7c043dca61fc5810c8
|
|
| BLAKE2b-256 |
99f12d4e86ef2837e6a620e76ce94792c7c950d0a22cde3d08afcfdc8118cdf8
|