Skip to main content

Exact-fractional-area zonal statistics over weather grids

Project description

geohalo

geohalo

Exact-fractional-area zonal statistics over regular lat/lon grids.

PyPI Python versions License: MIT Documentation Release workflow Managed by uv


Given a regular lat/lon mesh of gridded values — temperature, precipitation, population density, a land-cover fraction, a satellite band, … (loaded with xarray from GRIB, NetCDF, Zarr, …) — and an arbitrary set of polygons, geohalo reduces the spatial dimensions of the mesh to one value per polygon with sub-cell precision and millisecond-scale aggregation in the hot path.

The expensive geometric work happens once; every subsequent grid collapses to a single sparse · dense matmul.

📖 Full documentation: https://campiohe.github.io/geohalo/

How it works

Aggregation is a linear operator:

aggregates = W @ flat_grid_values

where W ∈ ℝ^(N_polygons × N_cells) is a sparse matrix whose entries are the exact fractional area of cell ∩ polygon weighted by each cell's true surface area on a sphere. W (the Stencil) depends only on the grid topology and the polygon set — not on the grid values — so it is built once (and cacheable) and reused for every slice. See Aggregation as a linear operator.

Install

geohalo targets Python ≥ 3.12.

uv add geohalo            # or: pip install geohalo

Optional extras: redis (the RedisCache backend) and matplotlib (the helpers in geohalo.plot).

Quickstart

import numpy as np
import geopandas as gpd
import xarray as xr
from shapely.geometry import box
import geohalo as ghl

# any regular lat/lon DataArray works; a synthetic field so this runs as-is
lats = np.arange(-25.0, -19.0, 0.25)
lons = np.arange(-50.0, -42.0, 0.25)
lon2d, lat2d = np.meshgrid(lons, lats)
field = 290.0 + 5.0 * np.cos(np.deg2rad(4 * lat2d)) + 0.1 * lon2d

da = xr.DataArray(
    field, dims=("latitude", "longitude"),
    coords={"latitude": lats, "longitude": lons}, name="value",
)

geoms = gpd.GeoSeries(
    [box(-49, -24, -47, -22), box(-47, -24, -45, -22), box(-46, -22, -44, -20)],
    index=["SP", "RJ", "MG"],            # the index holds the keys
)

out = ghl.reduce(da, geoms)                  # hot path; ms-scale
out_fine = ghl.reduce(da, geoms, target_resolution=0.05)   # refine the grid first
# out: xr.DataArray over (..., geom)

The output preserves every non-spatial dim of da (time, ensemble member, band, vertical level, …) and replaces (latitude, longitude) with a single geom dim indexed by the GeoSeries keys.

reduce also accepts an xr.Dataset (every spatial data var is reduced), how={"mean", "sum"}, a weight_key naming a per-cell weight variable, and spherical_correction=False to disable the latitude-area correction.

Documentation

Everything is covered in depth at https://campiohe.github.io/geohalo/:

Performance

The hot path is a single sparse · dense matmul: a 50-member batch over the ~5,570 GADM Brazil L2 municipalities reduces in single-digit milliseconds, and the one-time Stencil precompute is seconds and cacheable. Methodology, full tables, and the fused-operator size win are on the Performance page; re-run the suite with uv run python -m benchmarks.run.

Non-goals

  • No reprojection — EPSG:4326 throughout (grids and polygons).
  • No per-variable cache — the Stencil depends on grid + polygons only.
  • No WGS84-ellipsoidal cell areas — spherical is within ~0.3 % (spherical_correction=False gives planar/equal-area weights).
  • No DAG hierarchies — each child has exactly one parent (tree only).
  • No how={"min", "max"}mean and sum only.

Development

uv sync                                          # install deps
uv run pytest                                    # tests
uv run ruff check .                              # lint
uv run --group docs mkdocs serve                 # preview the docs locally
uv run --group docs python docs/gen_figures.py   # regenerate the doc figures

Docs are built with Material for MkDocs and deployed to https://campiohe.github.io/geohalo/ on every push to main.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geohalo-1.0.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geohalo-1.0.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file geohalo-1.0.0.tar.gz.

File metadata

  • Download URL: geohalo-1.0.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geohalo-1.0.0.tar.gz
Algorithm Hash digest
SHA256 64b88bd417c28713332b4920568b16b7c507e04fa0fa84f4dc9779ed6f7c4c2e
MD5 1d68f91a047ea16c431a4fde7ed3d023
BLAKE2b-256 b82b3a28a3adcd08a315526b7fe8a865c05aced38a8ce01ebc2b7a80c3c88c48

See more details on using hashes here.

Provenance

The following attestation bundles were made for geohalo-1.0.0.tar.gz:

Publisher: release.yml on campiohe/geohalo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geohalo-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: geohalo-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geohalo-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb78bf390ed322158cf322b9a270f972bec4c1145dad943a3853d21e32e4c40d
MD5 da1e58362021d520404b4ac7b6203f6b
BLAKE2b-256 31a1c040692c8020d6eddd8979e3e67d90661b78919682863f5241f7b87feca4

See more details on using hashes here.

Provenance

The following attestation bundles were made for geohalo-1.0.0-py3-none-any.whl:

Publisher: release.yml on campiohe/geohalo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page