Skip to main content

Given a geopoint, find the nearest city using PostGIS (reverse geocode).

Project description

Simple PostGIS Reverse Geocoder

HOT

Given a geopoint, find the nearest city using PostGIS (reverse geocode).

Publish Docs Publish Test Package version Downloads Pre-Commit License


📖 Documentation: https://hotosm.github.io/pg-nearest-city/

🖥️ Source Code: https://github.com/hotosm/pg-nearest-city


Why do we need this?

This package was developed primarily as a basic reverse geocoder for use within web frameworks (APIs) that have an existing PostGIS connection to utilise.

Simple alternatives:

  • The reverse geocoding package in Python is probably the original and canonincal implementation using K-D tree.
    • However, it's a bit outdated now, with numerous unattended pull requests and uses an unfavourable multiprocessing-based approach.
    • It leaves a large memory footprint of approximately 260MB to load the K-D tree in memory (see benchmarks), which remains there: an unacceptable compromise for a web server for such a small amount of functionality.
  • This package is an excellent revamp of the package above, and possibly the best choice in many scenarios, particularly if PostGIS is not available.

The pg-nearest-city approach:

  • Is approximately ~450x more performant (45ms --> 0.1ms).
  • Has a small ~8MB memory footprint, compared to ~260MB.
    • Depends on the selected baseline dataset and compression algorithm: GADM when compressed is ~25MB.
  • However it has a one-time initialisation penalty of approximately 30s-4m to load the data into the database (which could be handled at web server startup).
    • As with the memory footprint, this depends on the baseline dataset and compression algorithm selected.

[!NOTE] We don't discuss web based geocoding services here, such as Nominatim, as simple offline reverse-geocoding has two purposes:

  • Reduced latency, when very precise locations are not required.
  • Reduced load on free services such as Nominatim (particularly when running in automated tests frequently).
  • Reduced reliance on externally managed tools.

Priorities

  • Lightweight package size.
  • Minimal memory footprint.
  • High performance.
  • Keeping package size as small as possible.

How This Package Works

  • Ingest GeoNames cities500 data (cities with population > 500).
  • Ingest country boundary polygons from either GADM or Natural Earth.
  • Apply GeoBoundaries corrections for inaccurate upstream geometry (e.g. disputed territories, overseas regions).
  • Clean and normalise country data (spelling fixes, ISO 3166-1 alignment, etc.).
  • Simplify geometry with PostGIS ST_Subdivide to shrink size and speed up queries.
  • Query: use ST_Covers to identify the covering country polygon, then a KNN lateral join to find the nearest city within that country.

Usage

Install

Distributed as a pip package on PyPi:

pip install pg-nearest-city
# or use your dependency manager of choice

Run The Code

[!NOTE] Coordinates use (lon, lat) order throughout, matching the GIS / PostGIS convention where longitude is the X axis and latitude is Y.

Async

from pg_nearest_city import AsyncNearestCity

# Existing code to get db connection, say from API endpoint
db = await get_db_connection()

async with AsyncNearestCity(db) as geocoder:
    location = await geocoder.query(-74.0060, 40.7128)

print(location.city)
# "New York City"
print(location.country)
# "USA"

Sync

from pg_nearest_city import NearestCity

# Existing code to get db connection, say from API endpoint
db = get_db_connection()

with NearestCity(db) as geocoder:
    location = geocoder.query(-74.0060, 40.7128)

print(location.city)
# "New York City"
print(location.country)
# "USA"

Create A New DB Connection

  • If your app upstream already has a psycopg connection, this can be passed through.
  • If you require a new database connection, the connection parameters can be defined as DbConfig object variables:
from pg_nearest_city import DbConfig, AsyncNearestCity

db_config = DbConfig(
    dbname="db1",
    user="user1",
    password="pass1",
    host="localhost",
    port="5432",
)

async with AsyncNearestCity(db_config) as geocoder:
    location = await geocoder.query(-74.0060, 40.7128)
  • Or alternatively as variables from your system environment:
PGNEAREST_DB_NAME=cities
PGNEAREST_DB_USER=cities
PGNEAREST_DB_PASSWORD=somepassword
PGNEAREST_DB_HOST=localhost
PGNEAREST_DB_PORT=5432

then

from pg_nearest_city import AsyncNearestCity

async with AsyncNearestCity() as geocoder:
    location = await geocoder.query(-74.0060, 40.7128)

Testing

Via Docker:

docker compose run --rm code pytest

Or locally (requires a running PostgreSQL instance with PostGIS and loaded data):

PGNEAREST_DB_USER=myuser PGNEAREST_DB_NAME=postgres uv run pytest tests/ -v

Data Pipeline

The pgnearest-load CLI command runs a multi-step pipeline that downloads source data, imports it into PostGIS, applies corrections, simplifies geometry, and exports compressed CSV files for distribution.

Running the Pipeline

# Using Natural Earth boundaries
uv run pgnearest-load --boundary-source naturalearth \
    --db-name postgres --db-user myuser --output-dir ./output

# Using GADM boundaries
uv run pgnearest-load --boundary-source gadm \
    --db-name postgres --db-user myuser --output-dir ./output

# Full rebuild (drops all tables first)
uv run pgnearest-load ... --clean

Key Flags

Flag Description
--boundary-source {gadm,naturalearth} Which boundary dataset to use
--compression {auto,gzip,bz2,xz,zstd} Compression for exported files
--no-cache Download to temp dir, no persist
--clean Drop all project tables first
--skip-steps / --only-steps Step prefixes to skip or isolate
--list-steps Print pipeline steps and exit
--country Filter to a country (e.g. IT)

Output Files

The pipeline exports three compressed CSV files:

  • country.csv.<ext> — subdivided country polygons (alpha2, alpha3, name, WKB geometry).
  • geocoding.csv.<ext> — city-to-country mapping (city, country, lat, lon).
  • cities_500_simple.txt.<ext> — simplified city data.

When to Regenerate

Manual regeneration is only necessary when:

  • New GeoNames data becomes available and you want to update.
  • Upstream boundary datasets have been corrected.
  • You need to filter for a specific geographic region.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_nearest_city-2.0.0.tar.gz (6.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_nearest_city-2.0.0-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file pg_nearest_city-2.0.0.tar.gz.

File metadata

  • Download URL: pg_nearest_city-2.0.0.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.9.3 CPython/3.12.3

File hashes

Hashes for pg_nearest_city-2.0.0.tar.gz
Algorithm Hash digest
SHA256 43b9a729ab1e230073734f2034d151593ae3aa075128852fd33755c2f01999f8
MD5 5917ae083b6c29e14e6e28f0ce9eaeb3
BLAKE2b-256 aa43e01d67392300de0650a305b553f6cb2bb842fb89996d214a4709d982e704

See more details on using hashes here.

File details

Details for the file pg_nearest_city-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pg_nearest_city-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 51df615e5ffe080e0fce2ec2b251b852e9e08855852c2e29796a65aa0346b21a
MD5 6c3ae68ec3c3a99f3dc68fe973b53df8
BLAKE2b-256 c48488f907e44be18048d77080ed1a67fb678edcf24fe93cefa094bd2cce974e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page