Skip to main content

Global watershed delineation with MERIT-Hydro and MERIT-Basins data

Project description

delineator: Global Watershed Delineation with Python

Fast, accurate watershed delineation for any point on Earth's land surface, using a hybrid of vector- and raster-based methods with data from MERIT-Hydro and MERIT-Basins.

  • Near-global coverage (excludes Greenland, Antarctica, and some small islands)
  • Bundled sample data for Iceland; other regions download automatically on first use
  • Returns watershed polygon, river network, and outlet points as GeoPandas GeoDataFrames

Installation

Requires Python ≥ 3.10. Python 3.11+ is recommended for speed. Also recommended to install in a fresh virtual environment to avoid dependency conflicts.

macOS/Linux:

python3 -m venv venv
source venv/bin/activate
pip install delineator

Windows:

python -m venv venv
venv\Scripts\activate
pip install delineator

Quick start

The bundled Iceland data lets you run immediately after install; no separate download required.

Command line:

delineate --point 63.938 -21.004

This creates the watershed for the Ölfusá River at Route 1 in Iceland. Output is written to ./output/watershed.gpkg in your current directory. To create geodata for the river network and outlet points, run:

delineate --point 63.938 -21.004 --rivers --outlets

Python API:

Alternatively, you can use the delineate() function in your own Python scripts or notebooks.

from delineator import delineate, write_outputs

# The delineate function returns three GeoDataFrames
# Note the order of latitude, longitude!
watershed_gdf, rivers_gdf, outlets_gdf = delineate(63.938, -21.004)

# Do whatever you wish with the resulting GeoDataFrames.
# This utility function will write them to disk in one line. 
write_outputs(watershed_gdf, rivers_gdf, outlets_gdf, id="olfusa")

Here is an example of the output displayed in QGIS:

Example output

Command line reference

# Single point
delineate --point 63.938 -21.004

# Include rivers and outlet points
delineate --point 63.938 -21.004 --rivers --outlets

# Output different file formats
delineate --point 63.938 -21.004 --output-format geojson
delineate --point 63.938 -21.004 --output-format shp
delineate --point 63.938 -21.004 --output-format kml
delineate --point 63.938 -21.004 --output-format parquet

# Batch delineation of multiple outlet points in a CSV file
delineate --csv outlets.csv

# Custom output directory
delineate --csv outlets.csv --output-dir /path/to/output/

# List all the command line options
delineate --help

The CSV file must contain at minimum id, lat, and lon columns. Other columns are OK but will be ignored by the script.

id,lat,lon,name
6401070,64.71072,-21.60337,Nordhura River at Stekkur
6401080,64.69229,-21.41046,Hvita River at Kljafoss
6401090,63.93796,-21.00666,Olfusa River at Selfoss

Environment variables

Instead of passing options to the command line, you can set environment variables for the default data directory and the output director. There are three environment variables:

  • DELINEATOR_DATA_DIR: directory where input data files are saved
  • DELINEATOR_OUTPUT_DIR: directory where output files will be saved
  • DELINEATOR_AUTO_DOWNLOAD: whether to automatically download data files as they are needed

Environment variables add are useful when you want configuration that is global, repeatable, automatable, or sensitive, without forcing every CLI call or Python function call to spell everything out.

Environment variables work with the command-line interface or with the Python functions (delineate(), downloader()). Note that command line arguments will override environment variables, as will the DelineatorConfig object passed to delineate().

Set the three available environment variables as follows:

Mac/Linux:

export DELINEATOR_DATA_DIR=/mnt/data/delineator
export DELINEATOR_OUTPUT_DIR =/home/user/documents/watersheds
export DELINEATOR_AUTO_DOWNLOAD=false
delineator --csv outlets.csv

Windows CMD:

set DELINEATOR_DATA_DIR=D:\Data\delineator
set DELINEATOR_OUTPUT_DIR=C:\Users\user\Documents\watersheds
set DELINEATOR_AUTO_DOWNLOAD=false
delineator --csv outlets.csv

Windows Powershell:

$env:DELINEATOR_DATA_DIR = "D:\Data\delineator"
$env:DELINEATOR_OUTPUT_DIR = "C:\Users\user\Documents\watersheds"
$env:DELINEATOR_AUTO_DOWNLOAD = "false"
delineator --csv outlets.csv

Data

The globe is divided into 59 megabasins (integer IDs 11–86, data for Greenland, megabasin 91, has been omitted):

Megabasins map

Each megabasin requires four data files (vector catchments, vector rivers, flow-direction raster, accumulation raster). These download automatically on first use and are cached in your system's default data directory:

  • Windows: C:\Users\<username>\AppData\Local\delineator
  • Linux: ~/.local/share/delineator
  • macOS: ~/Library/Application Support/delineator

To pre-download data for a region:

delineator_download --basin 62    # e.g. basin 62 = Amazon
delineator_dir                    # show the cache location

Some regional datasets are up to 3 GB, so pre-downloading is recommended for large basins.

Override the cache location with an environment variable:

# macOS/Linux
export DELINEATOR_DATADIR=~/gis/delineator_data

# Windows
set DELINEATOR_DATADIR=D:\GIS\delineator_data

Configuration reference

When using the Python API, options are passed via a DelineatorConfig object:

from delineator import delineate, DelineatorConfig

config = DelineatorConfig(
    high_res=True,
    rivers=True,
    fill=True,
    output_format="gpkg",
    output_dir="/path/to/output",
)

watershed_gdf, rivers_gdf, outlets_gdf = delineate(63.938, -21.004, config)

# Config objects are mutable - update and reuse
config.rivers = False
watershed_gdf, _, _ = delineate(63.938, -21.59, config)

All options with their defaults:

Option Default Description
high_res True Refine the watershed boundary at the outlet using raster methods. More accurate but slower. Set False to skip (watershed will include some area downstream of the outlet).
low_res_threshold 6e6 Area in km² above which the script automatically falls back to low-res mode. The Amazon is ~5.9×10⁶ km².
fill True Fill small interior holes caused by topological gaps in MERIT-Hydro data.
fill_threshold 100 Maximum hole size to fill, in pixels on the 3″ grid (~90 m/pixel near the equator). Set 0 to fill all holes.
rivers True Include the upstream river network in output.
num_stream_orders 4 Strahler stream orders to include in river output. Set ≥ 9 for all available reaches.
outlets True Include requested and snapped outlet points in output.
output_format "gpkg" Output format: gpkg, geojson, shp, kml, parquet, or any GeoPandas-supported driver.
output_dir ./output/ Directory for output files.
data_dir system default Override the data cache location.
search_dist 0.1 Search radius in decimal degrees when the outlet falls outside all unit catchments (~10 km at the equator). Set 0 to require an exact hit.
simplify False Simplify output geometry using Douglas-Peucker. Reduces file size and removes staircase artifacts from raster-origin boundaries.
simplify_tolerance 0.0008 Tolerance in decimal degrees for simplification (~half a pixel edge length).
clean False Apply a small buffer/unbuffer to repair seam artifacts in the watershed polygon.
auto_download True Automatically download missing data files on first use.

Notes on select options

Filling holes

Setting fill=True removes small interior gaps in the watershed polygon. These arise from slivers between unit catchments in the source data and are usually unwanted. The fill_threshold parameter (in pixels) controls which holes are filled — larger holes representing genuine endorheic (internally draining) basins can be preserved by setting a threshold.

For example, the Rio Grande watershed contains a large endorheic basin between the main stem and the Pecos River that should not be filled:

Rio Grande Watershed

Search distance

If the outlet point falls just offshore, in an estuary, or in a gap between unit catchments, search_dist controls how far (in decimal degrees) the script searches for the nearest catchment. A value of at least 0.005 is recommended for coastal outlets.

Simplify

The watershed boundary inherits the staircase pattern of the underlying raster grid (pixel edge length ≈ 0.000833°). Setting simplify=True with simplify_tolerance ≈ 0.0004 or higher removes this artifact and reduces file size.

Examples

The examples/ directory on the project's GigHub page contains ready-to-run scripts.

Output files

In GeoPackage mode (default), all layers are written to a single file (watershed_<id>.gpkg) with three layers: watershed, rivers, and outlets. For other formats, each layer is written to a separate file.

⚠️ Always review your results

No automated watershed delineation software can replace human judgment. Always visually inspect every watershed you create with this package — there is no guarantee the output is correct.

Errors are common and often easy to miss without inspection. The good news is that many mistakes can be fixed by slightly adjusting the outlet coordinates and re-running. An experienced analyst can usually identify and resolve problems quickly, especially with an interactive map display.

Where delineation is most likely to fail

Certain landscapes are inherently difficult for any automated tool:

  • Flat terrain — where flow direction is ambiguous. Examples: Florida, the Netherlands, the Ganges-Brahmaputra Delta.
  • Arid and semi-arid areas — where channels are sparse or ephemeral. Examples: North Africa, Central China, the American Southwest.
  • Frozen environments — glaciers, tundra, and permafrost. Examples: Iceland, Greenland, northern Canada, northern Russia.
  • Karst and highly permeable terrain — where surface drainage boundaries are poorly defined because water moves through the subsurface. Examples: the Yucatán Peninsula, parts of the Deschutes basin in Oregon, the Karst Plateau along the Italy–Slovenia border.
  • Urban areas — where impervious surfaces, curbs, storm sewers, and drains alter or override natural flow paths.
  • Heavily engineered basins — irrigation canals, inter-basin transfers, and pipelines can reroute water in ways that no terrain-based algorithm can detect.

The most common error: incorrect pour point snapping

Even in well-behaved terrain, the most frequent source of error is pour point snapping — the outlet being snapped to the wrong river reach, often a nearby tributary. This produces a watershed on a completely different branch of the river network. Such errors are not correlated with watershed size or geography and can be subtle if you are not looking carefully.

If a result looks wrong, try nudging the outlet coordinates toward the river centerline and re-running. Overlaying the MERIT-Basins river network on your map makes this much easier. The examples/webapp.py interactive map is useful for this kind of iterative review.

Areas with no data

MERIT-Hydro does not cover Greenland, Antarctica, or some small islands (e.g., Hawaii, the Azores). Delineation will fail silently for outlet points in these areas.

delineator combines three techniques to achieve speed and low memory use:

  1. Hybrid raster/vector approach: vector unit catchments handle the bulk of the upstream area; raster flow-direction grids refine only the home catchment around the outlet.
  2. Hierarchical Spatial Aggregation: pre-computed nested catchments at five size levels (L0–L4) minimize the number of polygons that must be dissolved at runtime.
  3. SQLite-backed geodata: vector data is stored in relational SQLite databases with spatial indexes, enabling fast SQL lookups rather than loading entire datasets into memory.

Method diagram

The nested catchments at the southern end of Madagascar illustrate the aggregation levels:

Nested basins

For a full description, see the manuscript: [Fast, accurate watershed delineation with a hybrid of raster and vector methods] (https://mghydro.com/pages/Heberger_delineation_2025.pdf).

Citation

If you use delineator in your research, please cite:

@software{delineator,
  author    = {Matthew Heberger},
  title     = {delineator: Global Watershed Delineation with Python},
  year      = {2026},
  publisher = {GitHub},
  version   = {2.0.0},
  url       = {https://github.com/mheberger/delineator}
}

Contributing

This project is open source and welcomes contributions. If you have comments or suggestions, please open an issue or pull request, or drop the author an email.

License

MIT LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delineator-2.0.3.tar.gz (65.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delineator-2.0.3-py3-none-any.whl (64.0 MB view details)

Uploaded Python 3

File details

Details for the file delineator-2.0.3.tar.gz.

File metadata

  • Download URL: delineator-2.0.3.tar.gz
  • Upload date:
  • Size: 65.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for delineator-2.0.3.tar.gz
Algorithm Hash digest
SHA256 916e58d5565168c28defd592f100e27a6882051579383b59fd5997777e4defd1
MD5 319f925c62ccaf123325cbaa8e2e926d
BLAKE2b-256 7ea686d77f76e8e0c570dcffc92d95b57cb944fe802525287aab36401f53d9b1

See more details on using hashes here.

File details

Details for the file delineator-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: delineator-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 64.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for delineator-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6fc9645efbbc32fcdb290e54ac40d6f34c6d1aa9d06ed2e2d71d16093274d727
MD5 8252ef1a208f89f682b546eddac4d3b0
BLAKE2b-256 bfc43b4ddadaf699c6a453d092d0d0d28489f03a023f221c7dccadd3e27b644f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page