Administrative boundary semantic relation inference for geospatial datasets
Project description
adminbounds
Administrative boundary semantic relation inference for geospatial datasets.
Given any vector geometry, adminbounds answers: where does this geometry sit in the administrative hierarchy? It infers whether the geometry coincides with a known boundary, intersects multiple units, contains child units, or is contained by ancestor units — and stores the results as structured JSONB for downstream querying.
Bundled data covers China's four-level hierarchy (country → province → city → district). Any other country can be added on demand via GADM 4.1.
How It Works
The pipeline has three stages:
1. Boundary data 2. Inference 3. Annotation results
───────────────── ────────────────── ─────────────────────
admin_units table → infer_admin_semantic → thematic_admin_relations
(PostGIS polygons) _relation(geom) (JSONB per feature)
Stage 1 — Load boundary data into adminbounds.admin_units. Either use the bundled China data or download any country from GADM 4.1.
Stage 2 — Inference is a single PL/pgSQL function adminbounds.infer_admin_semantic_relation(geom) that classifies a geometry into four relationship types:
| Relationship | Meaning | Example |
|---|---|---|
coincides_with |
Substantially overlaps a known boundary (IoU ≥ 0.85) | A polygon matching Beijing municipality exactly |
intersects_with |
Partially overlaps units at the dominant level | A corridor crossing Nanjing and Suzhou |
covers_children |
The geometry contains child-level units | A province polygon covering its cities |
contained_by |
The ancestor chain above the matched unit | A city → its province → country |
Stage 3 — Batch annotation runs the inference function over every row of any PostGIS table and writes the results into adminbounds.thematic_admin_relations, keyed by (source_table, feature_uuid).
Prerequisites
- PostgreSQL 14+ with the PostGIS 3.x extension enabled
- Python 3.12+
Enable PostGIS on your target database if not already done:
CREATE EXTENSION IF NOT EXISTS postgis;
Installation
From PyPI (recommended for users):
pip install adminbounds
From source (for development):
git clone https://github.com/JohnnnyTang/admin-bounds.git
cd admin-bounds
uv sync # or: pip install -e .
Configuration
The package reads database credentials from environment variables or a .env file in the working directory.
cp .env.example .env
Edit .env:
ADMINBOUNDS_DB_HOST=localhost
ADMINBOUNDS_DB_PORT=5432
ADMINBOUNDS_DB_NAME=your_database
ADMINBOUNDS_DB_USER=your_username
ADMINBOUNDS_DB_PASSWORD=your_password
All CLI commands and the Python client fall back to these variables if no explicit connection arguments are given. You can also pass credentials directly:
adminbounds --host localhost --dbname mydb --user postgres --password secret init-db
from adminbounds import AdminBoundsClient
c = AdminBoundsClient(host="localhost", dbname="mydb", user="postgres", password="secret")
Quick Start
A complete walk-through from zero to annotated results:
# 1. Create the adminbounds schema and tables in your database
adminbounds init-db
# 2. Load boundary data (choose one or both)
adminbounds import-boundaries # bundled China data
adminbounds download-gadm Germany # or any country via GADM
# 3. Upload your dataset (adds a uuid primary key automatically)
adminbounds upload my_data.geojson my_table
# 4. Annotate — infers admin relations for every row
adminbounds annotate --source-table my_table --geom-col geom
# 5. Query results in PostgreSQL
psql mydb -c "
SELECT src.name, tar.admin_level_match, tar.confidence, tar.coincides_with
FROM public.my_table src
JOIN adminbounds.thematic_admin_relations tar
ON tar.source_table = 'public.my_table'
AND tar.feature_uuid = src.uuid;
"
CLI Reference
init-db
Creates the adminbounds schema, admin_units table, thematic_admin_relations table, and deploys the PL/pgSQL inference function. Safe to re-run — idempotent DDL and applies any pending migrations.
adminbounds init-db
Run this once before anything else. Must be re-run after upgrading the package to pick up schema changes.
import-boundaries
Loads bundled Chinese administrative boundaries into adminbounds.admin_units at four levels:
| Level | Coverage |
|---|---|
| 1 | Country (China) |
| 2 | Provinces (34) |
| 3 | Cities (~300) |
| 4 | Districts (~3000) |
adminbounds import-boundaries
Idempotent — re-running updates existing rows and skips already-computed derived fields. Run after init-db.
download-gadm
Downloads GADM 4.1 administrative boundaries for any country and imports them into adminbounds.admin_units. Accepts either an ISO3 code or a common English country name.
adminbounds download-gadm Germany
adminbounds download-gadm DEU # ISO3 code — same result
adminbounds download-gadm "United States"
adminbounds download-gadm USA --levels 0,1 # country + state only (level 2+ can be very large)
adminbounds download-gadm France --force # re-download even if already cached
adminbounds download-gadm Japan --cache-dir /data/gadm_cache
Downloaded zip files are cached in ~/.adminbounds/gadm_cache/ by default so repeated calls are fast. HTTP 404 for a level (not all countries have all 4 levels) is silently skipped.
GADM level → DB level mapping:
| GADM level | Meaning | DB level |
|---|---|---|
| 0 | Country | 1 |
| 1 | State / Province | 2 |
| 2 | County / City | 3 |
| 3 | Municipality / District | 4 |
GADM field → admin_units column mapping:
admin_units column |
GADM level 0 | GADM level 1 | GADM level 2 | GADM level 3 |
|---|---|---|---|---|
adcode |
GID_0 |
GID_1 |
GID_2 |
GID_3 |
name |
NAME_0 |
NAME_1 |
NAME_2 |
NAME_3 |
level |
1 |
2 |
3 |
4 |
parent_code |
NULL |
GID_0 |
GID_1 |
GID_2 |
geom |
geometry | geometry | geometry | geometry |
GADM GIDs look like DEU, DEU.1_1, DEU.1.2_1. The adcode column is TEXT to accommodate these (unlike the 6-digit numeric Chinese codes).
upload
Uploads a local GeoJSON file into a PostGIS table under the public schema. Automatically reprojects to EPSG:4326 if needed, and adds a uuid primary key column — which is required for the annotate command.
adminbounds upload path/to/data.geojson my_table
adminbounds upload path/to/data.geojson my_table --if-exists append # append to existing table
adminbounds upload path/to/data.geojson my_table --if-exists fail # error if table exists
The default --if-exists replace drops and recreates the table. Use append to add more features to an existing table without losing previous rows.
Tip: If you already have data in PostgreSQL and want to use
annotate, your table needs auuidcolumn of typeUUIDwith a primary key. You can add one with:ALTER TABLE myschema.my_table ADD COLUMN uuid UUID DEFAULT gen_random_uuid() PRIMARY KEY;
annotate
Runs infer_admin_semantic_relation on every geometry in a source table and writes the results into adminbounds.thematic_admin_relations. The source table must have a uuid column (added automatically by upload).
adminbounds annotate --source-table my_table --geom-col geom
adminbounds annotate --source-table myschema.my_table --geom-col geom # schema-qualified
adminbounds annotate --source-table my_table --geom-col geom --batch-size 50
Before processing, a pre-flight report is always printed:
Source table: public.my_table
Total rows: 1234
Already annotated: 0
Unannotated: 1234
Mode: skip → will annotate 1234 new row(s)
Re-annotation modes (--mode):
| Mode | Behavior | When to use |
|---|---|---|
skip |
(default) Only annotate rows not yet in thematic_admin_relations. Second run does nothing if fully annotated. |
Normal incremental runs; adding new rows to the source table |
update |
Re-infer all rows, overwriting existing results. Rows that fail inference keep their old result. | After refreshing boundary data (e.g. re-running download-gadm) |
replace |
Delete all existing annotations for this table first, then annotate everything from scratch. Clean slate. | After major data changes; guaranteed fresh results |
# Only annotate new rows added since last run
adminbounds annotate --source-table my_table --geom-col geom
# Re-infer everything after importing new boundary data
adminbounds annotate --source-table my_table --geom-col geom --mode update
# Full reset and re-annotate
adminbounds annotate --source-table my_table --geom-col geom --mode replace
The --schema flag sets the default schema when --source-table is not schema-qualified (default: public). If the table name already contains a dot (e.g. myschema.my_table), --schema is ignored.
Annotation is resume-safe in skip mode — you can interrupt and restart without reprocessing completed rows.
diagnose
Runs a series of diagnostic checks when annotation returns empty or unexpected results. Useful for debugging geometry CRS mismatches, missing derived fields, or spatial overlap issues.
adminbounds diagnose --source-table my_table --geom-col geom
adminbounds diagnose --source-table myschema.my_table --geom-col geom
Checks performed:
admin_unitsrow count and whether derived fields (geom_bboxetc.) have been computed- Level distribution of loaded boundaries
- Source table geometry count and SRID
- Source extent bounding box and whether it falls within loaded boundary extents
- Three-layer spatial filter pass-through counts on the first geometry (bbox → hull → full geom)
- Full function call result on the first geometry
Python API
All CLI operations are available as methods on AdminBoundsClient.
Connecting
from adminbounds import AdminBoundsClient
# Credentials from keyword arguments
c = AdminBoundsClient(
host="localhost",
port=5432,
dbname="mydb",
user="postgres",
password="secret",
)
# Or rely entirely on ADMINBOUNDS_DB_* environment variables / .env file
c = AdminBoundsClient()
Database setup
# Create schema + tables + deploy inference function
c.init_db()
# Load bundled Chinese boundaries
c.import_boundaries()
# Download GADM boundaries for any country
c.download_gadm("Germany") # all 4 levels
c.download_gadm("DEU") # ISO3 code, same result
c.download_gadm("USA", levels=[0, 1]) # country + state only
c.download_gadm("France", force=True) # re-download even if cached
c.download_gadm("Japan", cache_dir="/tmp/g") # custom cache directory
Single-geometry inference
from shapely.geometry import box, shape
import json
# Infer for a single Shapely geometry — returns a dict
result = c.infer(box(116.3, 39.8, 116.5, 40.0))
print(result["admin_level_match"]) # 2 (province level)
print(result["confidence"]) # 0.94
print(result["coincides_with"]) # [{"code": "110000", "name": "北京市", ...}]
print(result["contained_by"]) # [{"code": "100000", "name": "中国", ...}]
print(result["covers_children"]) # [{"code": "110101", ...}, ...]
Uploading data
# Upload a GeoJSON file → public.my_table, adds uuid primary key
count = c.upload("path/to/data.geojson", "my_table")
print(f"Uploaded {count} features")
# Append to existing table
c.upload("more_data.geojson", "my_table", if_exists="append")
Batch annotation
# Annotate all rows — only new rows on subsequent calls (skip mode)
count = c.annotate("my_table", geom_col="geom")
count = c.annotate("myschema.my_table", geom_col="geom") # schema-qualified
# Re-infer all rows after refreshing boundary data
count = c.annotate("my_table", mode="update")
# Clean slate
count = c.annotate("my_table", mode="replace")
# Progress callback
def on_progress(processed, _):
print(f"\r{processed} rows done", end="")
count = c.annotate("my_table", batch_size=50, on_progress=on_progress)
Diagnostics
results = c.diagnose("my_table", geom_col="geom")
# Prints a structured diagnostic report and returns a dict of check results
Database Schema
adminbounds.admin_units
Stores administrative boundaries at four levels. Supports both Chinese numeric adcodes (100000) and GADM GIDs (DEU.1_1).
| Column | Type | Description |
|---|---|---|
id |
SERIAL | Auto-incrementing primary key |
adcode |
TEXT | Unique admin code — 6-digit numeric (China) or GADM GID |
name |
TEXT | Place name |
level |
INTEGER | 1=country, 2=province/state, 3=city/county, 4=district |
parent_code |
TEXT | adcode of the parent unit (NULL for level 1) |
geom |
GEOMETRY | Full boundary polygon (EPSG:4326) |
geom_bbox |
GEOMETRY | Bounding box — used for fast coarse spatial filter |
geom_hull |
GEOMETRY | Convex hull — used for medium spatial filter |
geom_simple |
GEOMETRY | Simplified geometry for polygons with >500 vertices |
centroid |
GEOMETRY | Centroid point |
area_m2 |
FLOAT8 | Area in square metres |
vertex_count |
INTEGER | Vertex count of the original geometry |
adminbounds.thematic_admin_relations
Stores per-feature annotation results. One row per (source_table, feature_uuid) pair.
| Column | Type | Description |
|---|---|---|
id |
BIGSERIAL | Auto-incrementing primary key |
source_table |
TEXT | Fully qualified source table name (e.g. public.my_table) |
feature_uuid |
UUID | UUID of the feature from the source table's uuid column |
admin_level_match |
INTEGER | Dominant admin level of the best match |
confidence |
FLOAT8 | Confidence score 0–1 |
coincides_with |
JSONB | Array of units that substantially overlap the geometry |
intersects_with |
JSONB | Array of units that partially overlap at the dominant level |
covers_children |
JSONB | Array of child units contained within the geometry |
contained_by |
JSONB | Ancestor chain of the best-matched unit |
computed_at |
TIMESTAMPTZ | Timestamp of when this annotation was computed |
Inference Function
Can be called directly in SQL for ad-hoc queries:
SELECT adminbounds.infer_admin_semantic_relation(
ST_GeomFromText('POLYGON((116.3 39.8, 116.5 39.8, 116.5 40.0, 116.3 40.0, 116.3 39.8))', 4326)
);
Example output (Chinese boundary):
{
"coincides_with": [{"code": "110000", "name": "北京市", "level": 2, "similarity": 0.9731}],
"intersects_with": [],
"covers_children": [{"code": "110101", "name": "东城区", "level": 4}],
"contained_by": [{"code": "100000", "name": "中国", "level": 1}],
"admin_level_match": 2,
"confidence": 0.9866
}
Example output (German boundary after download-gadm Germany):
{
"coincides_with": [{"code": "DEU.1_1", "name": "Baden-Württemberg", "level": 2, "similarity": 0.9812}],
"intersects_with": [],
"covers_children": [{"code": "DEU.1.1_1", "name": "Freiburg im Breisgau", "level": 3}],
"contained_by": [{"code": "DEU", "name": "Germany", "level": 1}],
"admin_level_match": 2,
"confidence": 0.9906
}
Three-layer spatial filter (performance — avoids full-table geometry intersection):
- Bounding box overlap — GIST index scan, eliminates most candidates immediately
- Convex hull intersection — narrows the remaining candidates
- Full geometry intersection — precise check; uses simplified geometry for polygons with >500 vertices
Similarity metric (for coincides_with, threshold IoU ≥ 0.85):
similarity = 0.5 × IoU + 0.3 × area_ratio + 0.2 × (1 − normalised_centroid_offset)
Note on GADM and
contained_by: Thecontained_byfallback in the PL/pgSQL function uses substring-based ancestor lookup tuned for 6-digit Chinese codes. For GADM GIDs, the primary parent-chain walkup (via theparent_codecolumn) is used instead. Sinceparent_codeis correctly populated for all GADM data, this works correctly for all countries.
Querying Results
Check what's been loaded:
-- Boundary data by level
SELECT level, COUNT(*) FROM adminbounds.admin_units GROUP BY level ORDER BY level;
-- GADM data for a specific country
SELECT adcode, name, level FROM adminbounds.admin_units WHERE adcode LIKE 'DEU%' LIMIT 10;
Check annotation coverage:
SELECT source_table, COUNT(*) AS annotated_rows
FROM adminbounds.thematic_admin_relations
GROUP BY source_table;
Join annotation results back to the source table:
SELECT
src.*,
tar.admin_level_match,
tar.confidence,
tar.coincides_with,
tar.contained_by
FROM public.my_table src
JOIN adminbounds.thematic_admin_relations tar
ON tar.source_table = 'public.my_table'
AND tar.feature_uuid = src.uuid;
Find features that coincide with a specific admin unit:
-- Features coinciding with Jiangsu province (adcode 320000)
SELECT source_table, feature_uuid
FROM adminbounds.thematic_admin_relations
WHERE coincides_with @> '[{"code": "320000"}]';
-- Features coinciding with Germany
SELECT source_table, feature_uuid
FROM adminbounds.thematic_admin_relations
WHERE coincides_with @> '[{"code": "DEU"}]';
Find features at city level with high confidence:
SELECT tar.feature_uuid, tar.confidence, tar.coincides_with
FROM adminbounds.thematic_admin_relations tar
WHERE admin_level_match = 3
AND confidence > 0.85
ORDER BY confidence DESC;
Extract the first coinciding unit name as a plain text column:
SELECT
feature_uuid,
coincides_with -> 0 ->> 'name' AS matched_unit,
confidence
FROM adminbounds.thematic_admin_relations
WHERE coincides_with IS NOT NULL
AND jsonb_array_length(coincides_with) > 0;
Project Structure
admin-bounds/
├── src/adminbounds/
│ ├── __init__.py # Package entry point, exports AdminBoundsClient
│ ├── client.py # AdminBoundsClient — high-level Python API
│ ├── config.py # Pydantic settings (ADMINBOUNDS_DB_* env vars)
│ ├── db.py # SQLAlchemy engine + raw psycopg2 connection
│ ├── cli/__init__.py # CLI entry point (adminbounds command)
│ ├── _import.py # DDL deploy + bundled boundary import pipeline
│ ├── _gadm.py # GADM 4.1 worldwide download + import
│ ├── _annotate.py # Batch annotation logic with mode support
│ ├── _upload.py # GeoJSON → PostGIS upload helper
│ ├── _diagnose.py # Annotation diagnostic checks
│ └── sql/
│ ├── schema/
│ │ ├── 01_admin_units.sql
│ │ └── 02_thematic_admin_relations.sql
│ └── functions/
│ └── infer_admin_semantic_relation.sql
├── sql/ # Source copies of the SQL files (mirrors src/adminbounds/sql/)
├── validation/
│ └── sample_queries.sql # Post-import validation and smoke tests
├── .env.example
└── pyproject.toml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adminbounds-0.4.0.tar.gz.
File metadata
- Download URL: adminbounds-0.4.0.tar.gz
- Upload date:
- Size: 8.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
154e9f4befdc6c6caf4e0b81dd32bbd713afa4edbbe573cdddbb7cc751f9b52e
|
|
| MD5 |
3ddc86783c11c36f4f1d90a5924cf051
|
|
| BLAKE2b-256 |
b9407bcb0d45636e8ae87e42c4e50336abd917b1ff38dec7899807d2678b9b5b
|
File details
Details for the file adminbounds-0.4.0-py3-none-any.whl.
File metadata
- Download URL: adminbounds-0.4.0-py3-none-any.whl
- Upload date:
- Size: 8.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28e82274dda1e023af6982d1b5fae8b6cf0d422a9aa0c7a9e20a8d0ae3c99c6f
|
|
| MD5 |
813941ee48b60337abb4040a8080675a
|
|
| BLAKE2b-256 |
23df8983dd2de9fa150a4eb515a04296f8cae9fcc9745d5e45f8081099b851a6
|