Skip to main content

Python bindings for the shed watershed delineation engine

Project description

pyshed

Python bindings for the shed watershed delineation engine. pyshed loads HFX-format v0.2.1 datasets and returns watershed polygons from a (lat, lon) outlet. HFX v0.1 datasets no longer load. The full native stack (GDAL, PROJ, GEOS, libtiff, SQLite, and more) is bundled inside the wheel — no system install required.

Install

pip install pyshed

Platform support (v0.2.0): Apple Silicon macOS only (macosx_11_0_arm64). Linux, Intel macOS, and Windows wheels are not yet built — community contributions are welcome. See CONTRIBUTING.md if you want to help port the build.

Quickstart

import pyshed

engine = pyshed.Engine("/path/to/hfx/dataset")
result = engine.delineate(lat=47.3769, lon=8.5417)
print(result.area_km2)

Snapping options belong on the constructor, not on delineate:

# Correct — snap_radius is an Engine constructor kwarg
engine = pyshed.Engine("/path/to/hfx/dataset", snap_radius=5000)
result = engine.delineate(lat=47.3769, lon=8.5417)

Geometry repair defaults to the pure-Rust topology cleaner. Pass repair_geometry="gdal" to opt into the GDAL repairer; repair_geometry="auto", "clean", False, and None all use the default cleaner.

Engine also accepts dataset root URLs backed by the object-store integration:

local_engine = pyshed.Engine("/data/hfx/rhine")
file_url_engine = pyshed.Engine("file:///data/hfx/rhine")
s3_engine = pyshed.Engine("s3://bucket/path/to/hfx/rhine")
r2_engine = pyshed.Engine(
    "https://<account>.r2.cloudflarestorage.com/<bucket>/path/to/hfx/rhine"
)
public_r2_engine = pyshed.Engine(
    "https://basin-delineations-public.upstream.tech/global/hfx"
)

Remote dataset sessions cache manifest.json and graph.arrow under ~/.cache/hfx/<fabric_name>/<adapter_version>/ by default. Set HFX_CACHE_DIR=/path/to/cache before constructing pyshed.Engine(...) to use a different cache root. Parquet artifacts are read with object-store range reads; they are not copied into the cache wholesale.

GDAL raster URI and configuration plumbing is wired through the Python engine, but public Cloudflare R2 raster access still depends on the target bucket, credentials, and GDAL driver behavior. Verify the specific remote raster dataset you plan to use.

Verbose mode

Enable structured log output from both the Python and Rust layers:

import pyshed

pyshed.set_log_level("info")
engine = pyshed.Engine("https://basin-delineations-public.upstream.tech/global/hfx")
# INFO lines stream during manifest/graph/catchment loading
result = engine.delineate(lat=47.3769, lon=8.5417)

Valid levels: "trace", "debug", "info", "warn"/"warning", and "error"/"critical". Set PYSHED_LOG to one of those values to opt in at import time.

Speeding up repeated delineations

Enable the in-memory Parquet column-chunk cache to avoid redundant range reads across overlapping watersheds:

engine = pyshed.Engine(
    "https://basin-delineations-public.upstream.tech/global/hfx",
    parquet_cache=True,
    parquet_cache_max_mb=512,
)

The cache is enabled by default for remote dataset URLs and disabled by default for local paths. parquet_cache_max_mb defaults to 512 when caching is enabled. Cache state is per-Engine instance and is not persisted to disk.

Batch delineation with progress

import pyshed

# tqdm is a user dependency — not bundled with pyshed
from tqdm.auto import tqdm

url = "https://basin-delineations-public.upstream.tech/global/hfx"
engine = pyshed.Engine(url)

outlets = [
    {"lat": 47.3769, "lon": 8.5417},
    {"lat": 46.9480, "lon": 7.4474},
    {"lat": 48.1351, "lon": 11.5820},
]

bar = tqdm(total=len(outlets), unit="outlet")

def on_progress(event):
    bar.update(1)
    bar.set_postfix(status=event.get("status"), ms=event.get("duration_ms"))

results = engine.delineate_batch(outlets, progress=on_progress)
bar.close()

The progress callback receives a dict with keys index, total, lat, lon, duration_ms, status ("ok" or "error"), plus n_catchments on success and error on failure. Exceptions raised inside the callback are swallowed and logged; they do not interrupt the batch.

Staged delineation

delineate() is the convenience composition of the staged API:

level = engine.select_level(selection=pyshed.LevelSelection.FINEST)
outlet = engine.resolve_outlet(level, lat=47.3769, lon=8.5417)
upstream = engine.traverse(outlet)
units = engine.pre_merge_units(upstream)
refinement = engine.refine(outlet, units)
dissolved = engine.dissolve(units, refinement)
result = engine.compose_result(outlet, upstream, units, refinement, dissolved)

LevelSelection.FINEST is the only level selection in 0.2.0; multi-level selection is on the roadmap.

result matches engine.delineate(lat=47.3769, lon=8.5417). The merged result exposes final geometry_wkb, final area_km2, and light per-unit metadata (id, level, area_km2, up_area_km2, outlet). Whole per-unit geometry is available only on PreMergeDrainageUnits.unit_geometry_wkb.

R3 note: pre-merge units are whole source drainage units, including the whole terminal unit. If terminal refinement is applied, summing or unioning those whole units is not the same as the merged area_km2 or geometry_wkb.

GeoParquet export

Exports are explicit writer-object calls and write complete batches:

basin_writer = pyshed.BasinGeoParquetWriter()
basin_writer.write(engine, "basins.parquet", [result], basin_ids=["rhine-basel"])

bundle_writer = pyshed.UnitBundleGeoParquetWriter()
bundle_writer.write(engine, "units.parquet", [units], [refinement])

BasinGeoParquetWriter writes one merged basin row per result. basin_ids are caller-owned, filesystem-safe identifiers. Omitting basin_ids is allowed only with allow_default_basin_id=True and exactly one result, where the terminal unit ID becomes the basin ID.

UnitBundleGeoParquetWriter writes one row per pre-merge drainage unit. Unit rows use dataset-local unit_id, include terminal_unit_id and delineation grouping columns, and store whole-unit geometry.

Default delineation labels are {fabric_name}/{fabric_version}/{method}. The default method is d8-best-effort when refinement is enabled and no-refine when refine=False. The actual outcome is stored separately in refinement_status.

API Reference

For the full developer-oriented API surface, including argument types, return types, and the exception hierarchy, see API.md.

What it does

  • Resolves the outlet coordinate to a terminal HFX unit (via snap.parquet or point-in-polygon on catchments.parquet).
  • Walks the upstream graph in graph.parquet collecting all contributing units.
  • Optionally refines the terminal unit geometry using flow_dir.tif / flow_acc.tif rasters when present.
  • Returns a dissolved MultiPolygon + geodesic area in km².
  • Bundles GDAL / PROJ / GEOS / libtiff / SQLite — no system GDAL install needed.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyshed-0.2.1-cp39-abi3-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file pyshed-0.2.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyshed-0.2.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3ef9cda95509eba96615dc625a9c5870d876c5faf718bc64f9a4caeef31e9d0
MD5 aa36f17812c3934096a551e1c84431c0
BLAKE2b-256 7919c5279700b18a230adac01744463f55b85087bc3c41a1597cd88db8e52b3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page