Given a geopoint, find the nearest city using PostGIS (reverse geocode).
Project description
Simple PostGIS Reverse Geocoder
Given a geopoint, find the nearest city using PostGIS (reverse geocode).
📖 Documentation: https://hotosm.github.io/pg-nearest-city/
🖥️ Source Code: https://github.com/hotosm/pg-nearest-city
Why do we need this?
This package was developed primarily as a basic reverse geocoder for use within web frameworks (APIs) that have an existing PostGIS connection to utilise.
Simple alternatives:
- The reverse geocoding package
in Python is probably the original and canonincal implementation using K-D tree.
- However, it's a bit outdated now, with numerous unattended pull requests and uses an unfavourable multiprocessing-based approach.
- It leaves a large memory footprint of approximately 260MB to load the K-D tree in memory (see benchmarks), which remains there: an unacceptable compromise for a web server for such a small amount of functionality.
- This package is an excellent revamp of the package above, and possibly the best choice in many scenarios, particularly if PostGIS is not available.
The pg-nearest-city approach:
- Is approximately ~450x more performant (45ms --> 0.1ms).
- Has a small ~8MB memory footprint, compared to ~260MB.
- Depends on the selected baseline dataset and compression algorithm: GADM when compressed is ~25MB.
- However it has a one-time initialisation penalty of approximately 30s-4m
to load the data into the database (which could be handled at
web server startup).
- As with the memory footprint, this depends on the baseline dataset and compression algorithm selected.
[!NOTE] We don't discuss web based geocoding services here, such as Nominatim, as simple offline reverse-geocoding has two purposes:
- Reduced latency, when very precise locations are not required.
- Reduced load on free services such as Nominatim (particularly when running in automated tests frequently).
- Reduced reliance on externally managed tools.
Priorities
- Lightweight package size.
- Minimal memory footprint.
- High performance.
- Keeping package size as small as possible.
How This Package Works
- Ingest GeoNames cities500 data (cities with population > 500).
- Ingest country boundary polygons from either GADM or Natural Earth.
- Apply GeoBoundaries corrections for inaccurate upstream geometry (e.g. disputed territories, overseas regions).
- Clean and normalise country data (spelling fixes, ISO 3166-1 alignment, etc.).
- Simplify geometry with PostGIS
ST_Subdivideto shrink size and speed up queries. - Query: use
ST_Coversto identify the covering country polygon, then a KNN lateral join to find the nearest city within that country.
Usage
Install
Distributed as a pip package on PyPi:
pip install pg-nearest-city
# or use your dependency manager of choice
Run The Code
[!NOTE] Coordinates use (lon, lat) order throughout, matching the GIS / PostGIS convention where longitude is the X axis and latitude is Y.
Async
from pg_nearest_city import AsyncNearestCity
# Existing code to get db connection, say from API endpoint
db = await get_db_connection()
async with AsyncNearestCity(db) as geocoder:
location = await geocoder.query(-74.0060, 40.7128)
print(location.city)
# "New York City"
print(location.country)
# "USA"
Sync
from pg_nearest_city import NearestCity
# Existing code to get db connection, say from API endpoint
db = get_db_connection()
with NearestCity(db) as geocoder:
location = geocoder.query(-74.0060, 40.7128)
print(location.city)
# "New York City"
print(location.country)
# "USA"
Create A New DB Connection
- If your app upstream already has a psycopg connection, this can be passed through.
- If you require a new database connection, the connection parameters can be defined as DbConfig object variables:
from pg_nearest_city import DbConfig, AsyncNearestCity
db_config = DbConfig(
dbname="db1",
user="user1",
password="pass1",
host="localhost",
port="5432",
)
async with AsyncNearestCity(db_config) as geocoder:
location = await geocoder.query(-74.0060, 40.7128)
- Or alternatively as variables from your system environment:
PGNEAREST_DB_NAME=cities
PGNEAREST_DB_USER=cities
PGNEAREST_DB_PASSWORD=somepassword
PGNEAREST_DB_HOST=localhost
PGNEAREST_DB_PORT=5432
then
from pg_nearest_city import AsyncNearestCity
async with AsyncNearestCity() as geocoder:
location = await geocoder.query(-74.0060, 40.7128)
Testing
Via Docker:
docker compose run --rm code pytest
Or locally (requires a running PostgreSQL instance with PostGIS and loaded data):
PGNEAREST_DB_USER=myuser PGNEAREST_DB_NAME=postgres uv run pytest tests/ -v
Data Pipeline
The pgnearest-load CLI command runs a multi-step pipeline that downloads source
data, imports it into PostGIS, applies corrections, simplifies geometry, and
exports compressed CSV files for distribution.
Running the Pipeline
# Using Natural Earth boundaries
uv run pgnearest-load --boundary-source naturalearth \
--db-name postgres --db-user myuser --output-dir ./output
# Using GADM boundaries
uv run pgnearest-load --boundary-source gadm \
--db-name postgres --db-user myuser --output-dir ./output
# Full rebuild (drops all tables first)
uv run pgnearest-load ... --clean
Key Flags
| Flag | Description |
|---|---|
--boundary-source {gadm,naturalearth} |
Which boundary dataset to use |
--compression {auto,gzip,bz2,xz,zstd} |
Compression for exported files |
--no-cache |
Download to temp dir, no persist |
--clean |
Drop all project tables first |
--skip-steps / --only-steps |
Step prefixes to skip or isolate |
--list-steps |
Print pipeline steps and exit |
--country |
Filter to a country (e.g. IT) |
Output Files
The pipeline exports three compressed CSV files:
country.csv.<ext>— subdivided country polygons (alpha2, alpha3, name, WKB geometry).geocoding.csv.<ext>— city-to-country mapping (city, country, lat, lon).cities_500_simple.txt.<ext>— simplified city data.
When to Regenerate
Manual regeneration is only necessary when:
- New GeoNames data becomes available and you want to update.
- Upstream boundary datasets have been corrected.
- You need to filter for a specific geographic region.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pg_nearest_city-2.0.0.tar.gz.
File metadata
- Download URL: pg_nearest_city-2.0.0.tar.gz
- Upload date:
- Size: 6.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.9.3 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43b9a729ab1e230073734f2034d151593ae3aa075128852fd33755c2f01999f8
|
|
| MD5 |
5917ae083b6c29e14e6e28f0ce9eaeb3
|
|
| BLAKE2b-256 |
aa43e01d67392300de0650a305b553f6cb2bb842fb89996d214a4709d982e704
|
File details
Details for the file pg_nearest_city-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pg_nearest_city-2.0.0-py3-none-any.whl
- Upload date:
- Size: 6.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.9.3 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51df615e5ffe080e0fce2ec2b251b852e9e08855852c2e29796a65aa0346b21a
|
|
| MD5 |
6c3ae68ec3c3a99f3dc68fe973b53df8
|
|
| BLAKE2b-256 |
c48488f907e44be18048d77080ed1a67fb678edcf24fe93cefa094bd2cce974e
|