Skip to main content

Python bindings for the shed watershed delineation engine

Project description

pyshed

Python bindings for the shed watershed delineation engine. pyshed loads HFX-format datasets and returns watershed polygons from a (lat, lon) outlet. The full native stack (GDAL, PROJ, GEOS, libtiff, SQLite, and more) is bundled inside the wheel — no system install required.

Install

pip install pyshed

Platform support (v0.1): Apple Silicon macOS only (macosx_11_0_arm64). Linux, Intel macOS, and Windows wheels are not yet built — community contributions are welcome. See CONTRIBUTING.md if you want to help port the build.

Quickstart

import pyshed

engine = pyshed.Engine("/path/to/hfx/dataset")
result = engine.delineate(lat=47.3769, lon=8.5417)
print(result.area_km2)

Snapping options belong on the constructor, not on delineate:

# Correct — snap_radius is an Engine constructor kwarg
engine = pyshed.Engine("/path/to/hfx/dataset", snap_radius=5000)
result = engine.delineate(lat=47.3769, lon=8.5417)

Geometry repair defaults to the pure-Rust topology cleaner. Pass repair_geometry="gdal" to opt into the GDAL repairer; repair_geometry="auto", "clean", False, and None all use the default cleaner.

Engine also accepts dataset root URLs backed by the object-store integration:

local_engine = pyshed.Engine("/data/hfx/rhine")
file_url_engine = pyshed.Engine("file:///data/hfx/rhine")
s3_engine = pyshed.Engine("s3://bucket/path/to/hfx/rhine")
r2_engine = pyshed.Engine(
    "https://<account>.r2.cloudflarestorage.com/<bucket>/path/to/hfx/rhine"
)
public_r2_engine = pyshed.Engine(
    "https://basin-delineations-public.upstream.tech/global/hfx"
)

Remote dataset sessions cache manifest.json and graph.arrow under ~/.cache/hfx/<fabric_name>/<adapter_version>/ by default. Set HFX_CACHE_DIR=/path/to/cache before constructing pyshed.Engine(...) to use a different cache root. Parquet artifacts are read with object-store range reads; they are not copied into the cache wholesale.

GDAL raster URI and configuration plumbing is wired through the Python engine, but public Cloudflare R2 raster access still depends on the target bucket, credentials, and GDAL driver behavior. Verify the specific remote raster dataset you plan to use.

Verbose mode

Enable structured log output from both the Python and Rust layers:

import pyshed

pyshed.set_log_level("info")
engine = pyshed.Engine("https://basin-delineations-public.upstream.tech/global/hfx")
# INFO lines stream during manifest/graph/catchment loading
result = engine.delineate(lat=47.3769, lon=8.5417)

Valid levels: "trace", "debug", "info", "warn"/"warning", and "error"/"critical". Set PYSHED_LOG to one of those values to opt in at import time.

Speeding up repeated delineations

Enable the in-memory Parquet column-chunk cache to avoid redundant range reads across overlapping watersheds:

engine = pyshed.Engine(
    "https://basin-delineations-public.upstream.tech/global/hfx",
    parquet_cache=True,
    parquet_cache_max_mb=512,
)

The cache is enabled by default for remote dataset URLs and disabled by default for local paths. parquet_cache_max_mb defaults to 512 when caching is enabled. Cache state is per-Engine instance and is not persisted to disk.

Batch delineation with progress

import pyshed

# tqdm is a user dependency — not bundled with pyshed
from tqdm.auto import tqdm

url = "https://basin-delineations-public.upstream.tech/global/hfx"
engine = pyshed.Engine(url)

outlets = [
    {"lat": 47.3769, "lon": 8.5417},
    {"lat": 46.9480, "lon": 7.4474},
    {"lat": 48.1351, "lon": 11.5820},
]

bar = tqdm(total=len(outlets), unit="outlet")

def on_progress(event):
    bar.update(1)
    bar.set_postfix(status=event.get("status"), ms=event.get("duration_ms"))

results = engine.delineate_batch(outlets, progress=on_progress)
bar.close()

The progress callback receives a dict with keys index, total, lat, lon, duration_ms, status ("ok" or "error"), plus n_catchments on success and error on failure. Exceptions raised inside the callback are swallowed and logged; they do not interrupt the batch.

API Reference

For the full developer-oriented API surface, including argument types, return types, and the exception hierarchy, see API.md.

What it does

  • Resolves the outlet coordinate to a terminal HFX atom (via snap.parquet or point-in-polygon on catchments.parquet).
  • Walks the upstream graph in graph.arrow collecting all contributing atoms.
  • Optionally refines the terminal atom geometry using flow_dir.tif / flow_acc.tif rasters when present.
  • Returns a dissolved MultiPolygon + geodesic area in km².
  • Bundles GDAL / PROJ / GEOS / libtiff / SQLite — no system GDAL install needed.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyshed-0.1.11-cp39-abi3-macosx_11_0_arm64.whl (6.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file pyshed-0.1.11-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyshed-0.1.11-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d1bf1ce448a3143b898698350b05b6cfd7fe4b063efee8b4300dba7ebb867e2e
MD5 5eef92cdc8bc8be77afbf7a268c73992
BLAKE2b-256 ccdc9ee37eb07ec337481ee934ab599d420bbfb866ec8e28ce45f1a1e3ec97f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page