Skip to main content

Global watershed delineation with MERIT-Hydro and MERIT-Basins data

Project description

delineator: Global Watershed Delineation with Python

Fast, accurate watershed delineation for any point on Earth's land surface, using a hybrid of vector- and raster-based methods with data from MERIT-Hydro and MERIT-Basins.

  • Near-global coverage (excludes Greenland, Antarctica, and some small islands)
  • Bundled sample data for Iceland; other regions download automatically on first use
  • Returns watershed polygon, river network, and outlet points as GeoPandas GeoDataFrames

Contents

Installation

Requires Python ≥ 3.10. Python 3.11+ is recommended for speed. Also recommended to install in a fresh virtual environment to avoid dependency conflicts.

macOS/Linux:

python3 -m venv venv
source venv/bin/activate
pip install delineator

Windows:

python -m venv venv
venv\Scripts\activate
pip install delineator

Quick start

The bundled Iceland data lets you run immediately after installation; no separate download required.

Command line usage

delineate --point 63.938 -21.004

This creates the watershed for the Ölfusá River at Route 1 in Iceland. Output is written to ./output/watershed.gpkg in your current directory. To create geodata for the river network and outlet points, run:

delineate --point 63.938 -21.004 --rivers --outlets

Python script usage

Alternatively, you can use the delineate() function in your own Python scripts or notebooks.

from delineator import delineate, write_outputs

# The delineate function returns three GeoDataFrames
# Note the order of latitude, longitude!
watershed_gdf, rivers_gdf, outlets_gdf = delineate(63.938, -21.004)

# Do whatever you wish with the resulting GeoDataFrames.
# This utility function will write them to disk in one line. 
write_outputs(watershed_gdf, rivers_gdf, outlets_gdf, id="olfusa")

Here is an example of the output displayed in QGIS:

Example output

Command line reference

# Single point
delineate --point 63.938 -21.004

# Include rivers and outlet points
delineate --point 63.938 -21.004 --rivers --outlets

# Output different file formats
delineate --point 63.938 -21.004 --output-format geojson
delineate --point 63.938 -21.004 --output-format shp
delineate --point 63.938 -21.004 --output-format kml
delineate --point 63.938 -21.004 --output-format parquet

# Batch delineation of multiple outlet points in a CSV file
delineate --csv outlets.csv

# Custom output directory
delineate --csv outlets.csv --output-dir /path/to/output/

# List all the command line options
delineate --help

For batch delineation, the CSV file must contain at minimum id, lat, and lon columns. Other columns are OK but will be ignored by the script. Example CSV file:

id,lat,lon,name
6401070,64.71072,-21.60337,Nordhura River at Stekkur
6401080,64.69229,-21.41046,Hvita River at Kljafoss
6401090,63.93796,-21.00666,Olfusa River at Selfoss

Output files

When --output-format gpkg (the default), all layers are written to a single file (watershed_<id>.gpkg) with three layers: watershed, rivers, and outlets.

For other formats like shp, each layer is written to a separate file, for example rivers.shp, outlets.shp, and watershed.shp.

Environment variables

Instead of passing options to the command line, you can set environment variables for the default data directory and the output director. There are three environment variables:

  • DELINEATOR_DATA_DIR: directory where input data files are saved
  • DELINEATOR_OUTPUT_DIR: directory where output files will be saved
  • DELINEATOR_AUTO_DOWNLOAD: whether to automatically download data files as they are needed

Environment variables add are useful when you want configuration that is global, repeatable, automatable, or sensitive, without forcing every CLI call or Python function call to spell everything out.

Environment variables work with the command-line interface or with the Python functions (delineate(), downloader()). Note that command line arguments will override environment variables, as will the DelineatorConfig object passed to delineate().

Set the three available environment variables as follows:

Mac/Linux:

export DELINEATOR_DATA_DIR=/mnt/data/delineator
export DELINEATOR_OUTPUT_DIR =/home/user/documents/watersheds
export DELINEATOR_AUTO_DOWNLOAD=false
delineator --csv outlets.csv

Windows CMD:

set DELINEATOR_DATA_DIR=D:\Data\delineator
set DELINEATOR_OUTPUT_DIR=C:\Users\user\Documents\watersheds
set DELINEATOR_AUTO_DOWNLOAD=false
delineator --csv outlets.csv

Windows Powershell:

$env:DELINEATOR_DATA_DIR = "D:\Data\delineator"
$env:DELINEATOR_OUTPUT_DIR = "C:\Users\user\Documents\watersheds"
$env:DELINEATOR_AUTO_DOWNLOAD = "false"
delineator --csv outlets.csv

Configuration reference

When using the Python function delineate(), options are passed via a DelineatorConfig object:

from delineator import delineate, DelineatorConfig

config = DelineatorConfig(
    high_res=True,
    rivers=True,
    fill=True,
    output_format="gpkg",
    output_dir="/path/to/output",
)

watershed_gdf, rivers_gdf, outlets_gdf = delineate(63.938, -21.004, config)

# Config objects are mutable - update and reuse
config.rivers = False
config.outlets = False
config.output_format = "geojson"
watershed_gdf, _, _ = delineate(63.938, -21.59, config)

All options with their defaults:

Option Default Description
auto_download True Automatically download missing data files on first use.
clean False Apply a small buffer/unbuffer to repair seam artifacts in the watershed polygon.
data_dir system default Override the data cache location.
fill True Fill small interior holes caused by topological gaps in MERIT-Hydro data.
fill_threshold 100 Maximum hole size to fill, in pixels on the 3″ grid (~90 m/pixel near the equator). Set 0 to fill all holes.
high_res True Refine the watershed boundary at the outlet using raster methods. More accurate but slower. Set False to skip (watershed will include some area downstream of the outlet).
low_res_threshold 6e6 Area in km² above which the script automatically falls back to low-res mode. The Amazon is ~5.9×10⁶ km².
rivers True Include the upstream river network in output.
num_stream_orders 4 The number of Strahler stream orders to include in river network output. Set ≥ 9 for all available reaches.
outlets True Include requested and snapped outlet points in output.
output_format gpkg Output format: gpkg, geojson, shp, kml, parquet, or any GeoPandas-supported driver.
output_dir ./output/ Directory for output files.
search_dist 0.1 Search radius in decimal degrees when the outlet falls outside all unit catchments (~10 km at the equator). Set 0 to require an exact hit.
simplify False Simplify output geometry using Douglas-Peucker. Reduces file size and removes staircase artifacts from raster-origin boundaries.
threshold_single 3000 Number of upstream pixels that defines a stream for snapping the outlet, when the outlet is in a unit catchment with no upstream contributing catchments.
threshold_multiple 5000 Number of upstream pixels that defines a stream for snapping the outlet, when the outlet is in a unit catchment wih upstream contributing catchments.

Notes on select options

Filling holes

Setting fill=True removes small interior gaps or "donut holes" in the watershed polygon. These arise from slivers between unit catchments in the source data and are usually unwanted. The fill_threshold parameter (in pixels) controls which holes are filled — larger holes representing genuine endorheic (internally draining) basins can be preserved by setting a threshold.

For example, the Rio Grande watershed contains a large endorheic basin between the main stem and the Pecos River that should probably not be filled, at least for studies of surface drainage:

Rio Grande Watershed

Search distance

If the outlet point falls just offshore, in an estuary, or in a gap between unit catchments, search_dist controls how far (in decimal degrees) the script searches for the nearest catchment. A value of at least 0.005 is recommended for coastal outlets.

Simplify

The watershed boundary inherits the staircase pattern of the underlying raster grid (pixel edge length ≈ 0.000833°). Setting simplify=True with simplify_tolerance ≈ 0.0004 or higher removes this artifact and reduces file size. The simplify_tolerance parameter is equivalent to the threshold for Douglas-Peucker simplification.

Thresholds for snapping

The process of "snapping" the outlet point to a river centerline is where watershed delineation becomes both an art and a science. The threshold_single and threshold_multiple parameters control how many upstream pixels are required to define a stream for snapping the outlet point. The values for these parameters define how many upstream pixels are required to define a stream.

Accumulation raster

Data files

The delineator package comes bundled with data for Iceland. Beyond this, you will need data files for other regions. The globe is divided into 59 megabasins (integer IDs 11–86, data for Greenland, megabasin 91, has been omitted):

Megabasins map

Each megabasin requires four data files (vector catchments, vector rivers, flow-direction raster, accumulation raster). These download automatically on first use and are saved in your system's default data directory:

  • Windows: C:\Users\<username>\AppData\Local\delineator
  • Linux: ~/.local/share/delineator
  • macOS: ~/Library/Application Support/delineator

To pre-download data for a region:

delineator_download --basin 62    # e.g. basin 62 = Amazon
delineator_dir                    # show the cache location

You can also download these datasets manually by visiting: https://mghydro.com/watersheds/delineator-data.html.

Some regional datasets are up to 3 GB, so pre-downloading is recommended for large basins.

Override the default data directory with an environment variable:

# macOS/Linux
export DELINEATOR_DATADIR=~/gis/delineator_data

# Windows
set DELINEATOR_DATADIR=D:\GIS\delineator_data

⚠️ Always review your results

No automated watershed delineation software can replace human judgment. Always visually inspect every watershed you create with this package — there is no guarantee the output is correct.

Errors are common and often easy to miss without inspection. The good news is that many mistakes can be fixed by slightly adjusting the outlet coordinates and re-running. An experienced analyst can usually identify and resolve problems quickly, especially with an interactive map display.

Where delineation is most likely to fail

Certain landscapes are inherently challenging for any automated tool:

  • Flat terrain — where flow direction is ambiguous. Examples: Florida, the Netherlands, the Ganges-Brahmaputra Delta.
  • Arid and semi-arid areas — where channels are sparse or ephemeral. Examples: North Africa, Central China, the American Southwest.
  • Frozen environments — glaciers, tundra, and permafrost. Examples: Iceland, Greenland, northern Canada, northern Russia.
  • Karst and highly permeable terrain — where surface drainage boundaries are poorly defined because water moves through the subsurface. Examples: the Yucatán Peninsula, parts of the Deschutes basin in Oregon, the Karst Plateau along the Italy–Slovenia border.
  • Urban areas — where impervious surfaces, curbs, storm sewers, and drains alter or override natural flow paths.
  • Heavily engineered basins — irrigation canals, inter-basin transfers, and pipelines can reroute water in ways that no terrain-based algorithm can detect.

The most common error: incorrect pour point snapping

Even in well-behaved terrain, the most frequent source of error is pour point snapping — the outlet being snapped to the wrong river reach, often a nearby tributary. This produces a watershed on a completely different branch of the river network. Such errors are not correlated with watershed size or geography and can be subtle if you are not looking carefully.

If the result looks wrong, try nudging the outlet coordinates toward the river centerline and re-running. Overlaying the MERIT-Basins river network on your map makes this much easier. The examples/demo_webapp.py interactive map is useful for this kind of iterative review.

Areas with no data

MERIT-Hydro does not cover Greenland, Antarctica, or some small islands (e.g., Hawaii, the Azores). Delineation will fail silently for outlet points in these areas.

Usage examples

The examples/ directory on the project's GitHub page contains ready-to-run scripts. The example scripts show how to use delineatorand even how to set up a local, web-based point-and-click watershed delineation service similar to Global Watersheds.

Algorithm

The delineator combines three techniques to achieve speed and low memory use compared to traditional raster watershed delineation methods:

  1. Hybrid raster/vector approach: vector unit catchments handle the bulk of the upstream area; raster methods refine only the home catchment around the outlet.
  2. Hierarchical Spatial Aggregation: pre-computed nested catchments at five size levels (L0–L4) minimize the number of polygons that must be dissolved at runtime.
  3. SQLite-backed geodata: vector data is stored in relational SQLite databases with spatial indexes, enabling fast SQL lookups rather than loading entire datasets into memory.

Method diagram

The nested catchments at the southern end of Madagascar illustrate the aggregation levels:

Nested basins

For a more detailed description, see the manuscript: [Fast, accurate watershed delineation with a hybrid of raster and vector methods] (https://mghydro.com/pages/Heberger_delineation_2025.pdf).

Citation

If you use delineator in your research, please cite the project homepage, this GitHub repository. Here's a BibTeX entry:

@software{delineator,
  author    = {Matthew Heberger},
  title     = {delineator: Global Watershed Delineation with Python},
  year      = {2026},
  publisher = {GitHub},
  version   = {2.0.4},
  url       = {https://github.com/mheberger/delineator}
}

Contributing

This project is open source and welcomes contributions. If you have comments or suggestions, please open an issue or drop the author an email.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delineator-2.0.5.tar.gz (65.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delineator-2.0.5-py3-none-any.whl (64.0 MB view details)

Uploaded Python 3

File details

Details for the file delineator-2.0.5.tar.gz.

File metadata

  • Download URL: delineator-2.0.5.tar.gz
  • Upload date:
  • Size: 65.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for delineator-2.0.5.tar.gz
Algorithm Hash digest
SHA256 f6f821862b25f4d4bccf36ce59914ec91bd2b10bb90c4f357755edd45a571ea7
MD5 0e034145af48b1e921e06b92857bf763
BLAKE2b-256 a14ca45db817235def9a58767b927d73e37559286f4ab6f8170cb62632f6cbf7

See more details on using hashes here.

File details

Details for the file delineator-2.0.5-py3-none-any.whl.

File metadata

  • Download URL: delineator-2.0.5-py3-none-any.whl
  • Upload date:
  • Size: 64.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for delineator-2.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2f490afe8b850cc2f5e7b09234ca79ca2fa4287410d31a5da6be3ba73a081056
MD5 e3ee4631959fcbd2f5193447b21c0edd
BLAKE2b-256 52dcb1e3b7434eecc02d5ea31357545afd6ac7c0de07d652355a77b8e53e6ae6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page