Skip to main content

AI-Hydro: hydrological research tools as an MCP server for AI agents

Project description

aihydro-tools

AI-Hydro — Intelligent Hydrological Computing

Stop writing plumbing. Give AI agents real hydrological superpowers.

PyPI Python License Stars Docs DOI


What is aihydro-tools?

aihydro-tools is the Python backbone of the AI-Hydro platform. It turns a conversation with an AI agent into real hydrological computation — watershed delineation, streamflow retrieval, signature extraction, terrain analysis, and model calibration — with full structured provenance recorded automatically at every step.

Tools are exposed via the Model Context Protocol (MCP), the open standard for agent-tool communication. Any AI model that supports MCP — Claude, GPT, Gemini — can call these tools directly, without writing a single line of processing code. And because aihydro-tools is built as a community platform, any researcher can register domain-specific tools (flood frequency, sediment transport, groundwater, remote sensing) via Python entry points, extending the ecosystem without touching the core.


Quick Start

# Install
pip install aihydro-tools[all]

# Verify
aihydro-mcp --diagnose

# Run the server
aihydro-mcp

The AI-Hydro VS Code extension auto-detects aihydro-mcp on startup — no manual configuration needed.


Built-in Tools

Analysis Tools

Category Tool Description
Watershed delineate_watershed NHDPlus watershed delineation from USGS NLDI given a gauge ID
Streamflow fetch_streamflow_data Daily discharge time series from USGS NWIS
Signatures extract_hydrological_signatures 15+ flow statistics: BFI, runoff ratio, FDC percentiles, recession constants
Geomorphic extract_geomorphic_parameters 28 basin morphometry metrics (area, slope, elevation, shape factors)
Terrain compute_twi Topographic Wetness Index from 3DEP 10m DEM
Curve Number create_cn_grid NRCS Curve Number grid from NLCD land cover + Polaris soils
Forcing fetch_forcing_data Basin-averaged GridMET climate data (prcp, tmax, tmin, PET, srad, wind)
CAMELS extract_camels_attributes Full CAMELS-US attribute set (671 gauges) via pygeohydro
Modelling train_hydro_model Differentiable HBV-light (PyTorch) or NeuralHydrology LSTM
Modelling get_model_results Retrieve cached model performance (NSE, KGE, RMSE)
Session start_session Initialize or resume a per-gauge research session
Session get_session_summary Overview of computed and pending analysis slots
Session clear_session Reset cached results to force re-computation
Session add_note Attach research notes to the session
Session export_session Export a reproducible research capsule with data, figures, methods, and environment
Session get_session_raw_state Retrieve raw computed results for LLM interpretation (Phase 1 of two-phase split)
Session write_research_interpretation Store LLM-authored scientific interpretation (Phase 2 of two-phase split)
Session archive_session Archive completed session to a timestamped ZIP
Session merge_session_shards Merge parallel sub-agent shards into the main session
Ledger add_claim Add a scoped scientific claim to the session ledger
Ledger update_claim_status Update the status/confidence of an existing claim
Ledger list_claims List all claims in the session (filterable by status)
Ledger add_assumption Record a scientific assumption or caveat
Ledger list_assumptions List all assumptions in the session
Ledger promote_claim_to_registry Promote a validated claim to the global knowledge registry
Ledger draft_claim_from_run Auto-draft a claim pre-filled with evidence from a Tier 1 tool run
Validators check_water_balance_consistency Flag mass-balance violations in signature + streamflow data
Validators check_temporal_alignment Verify forcing and streamflow cover the same time window
Validators check_unit_consistency Confirm a session slot carries the expected physical units
Visualization show_on_map Push any GeoJSON geometry onto the AI-Hydro map panel
Discover list_available_tools Enumerate all installed tools, including community plugins
Discover list_skills List available workflow playbooks by domain
Discover load_skill Load a workflow playbook for a multi-step analysis
Discover get_library_reference Retrieve an API idiom card for a Python library
Discover list_relevant_clis List relevant external CLI tools
Discover get_variable_definition Look up a hydrology variable by ID (units, aliases, notes)
Discover get_metric_definition Look up a performance metric by ID
Discover get_dataset_definition Look up a dataset by ID (provider, resolution, variables)

Project, Literature & Researcher Memory

Category Tool Description
Project start_project Create or resume a named research project spanning multiple gauges or topics
Project get_project_summary Overview of all gauges, journal entries, literature, and metrics in a project
Project add_gauge_to_project Associate a gauge session with the active project
Project search_experiments Full-text search across all gauge sessions in a project
Literature index_literature Scan a folder of PDF, txt, or md files and build a searchable text index
Literature search_literature Query the index and return excerpts for the agent to synthesise
Literature add_journal_entry Log a timestamped experiment note to the project journal
Persona get_researcher_profile Retrieve the persistent researcher profile (expertise, focus, preferences)
Persona update_researcher_profile Update profile fields — agent or researcher driven
Persona log_researcher_observation Record an observation about the researcher's evolving interests and methods

Community plugins can add further tools via the entry-point system (see below).


Data Sources

All data is fetched from authoritative federal sources:

  • USGS NWIS — daily streamflow via dataretrieval (official USGS Python client)
  • NHDPlus / NLDI — watershed delineation via pynhd
  • GridMET — climate forcing via pygridmet
  • 3DEP — DEM and terrain analysis via py3dep
  • NLCD — land cover classification
  • POLARIS — soil properties
  • CAMELS-US — catchment attributes via pygeohydro

Memory & Provenance

AI-Hydro maintains a three-tier memory hierarchy so research context survives between conversations, sessions, and projects.

HydroSession — per-gauge state at ~/.aihydro/sessions/<gauge_id>.json. Expensive computations (watershed delineation, multi-year streamflow downloads) are done once and reused across days or weeks. Every result carries structured provenance metadata — data source, parameters, timestamp — making reproducibility a natural byproduct rather than a documentation chore.

ProjectSession — project-scoped state at ~/.aihydro/projects/<name>/project.json. Organises research across multiple gauges, topics, or datasets. Supports cross-session experiment search, a timestamped journal, and literature indexing.

ResearcherProfile — a persistent persona at ~/.aihydro/researcher.json. Built up from agent-researcher interactions over time: expertise areas, preferred models, active projects, and accumulated observations. Injected into every conversation automatically so the agent knows who it is working with.


Scientific Trust

AI-Hydro goes beyond computation — it records why results should be believed and what remains uncertain.

Tool Tier System

Every tool is assigned to one of three evidence tiers (machine-readable via get_tool_tier(name)):

Tier Label Automatic enforcement
1 Scientific output quality_flags injected into every result; _run_id minted for evidence binding
2 Workflow / data No automatic enforcement; best-effort provenance
3 Infrastructure No validation load; session plumbing only

Tier 1 tools (watershed delineation, signatures, TWI, CN, model training) fire registered post-run validators automatically. Every Tier 1 result carries:

  • quality_flags — list of validator outcomes (pass/warning/fail with severity)
  • _run_id — stable evidence-binding key for linking claims to specific runs

Scientific Ledger

Results become actionable knowledge through the ledger:

  1. draft_claim_from_run(session_id, run_id, metric_ref) — reads the run log for any Tier 1 tool call and returns a claim template with evidence_spans pre-populated. The agent authors only the scientific interpretation.
  2. add_claim(..., evidence_spans=[...]) — records the claim. EvidenceSpan ties the claim to a run, paper, or dataset with typed attribution (source_type, source_id, metric_ref).
  3. promote_claim_to_registry(..., researcher_approved=True) — passes a promotion gate: at least one evidence_span, at least one limitation, and status supported or weakly_supported. Researcher approval is required.

Verified Knowledge

Built-in knowledge entries can be marked verified: true in the YAML registry (e.g., metric.kge, variable.streamflow, dataset.usgs_nwis). Verified entries require scientific_justification — not just overrides + override_reason — in workspace override files, ensuring overrides of peer-reviewed conventions are deliberate and documented.

Retrieve all verified entries programmatically:

from ai_hydro.knowledge.loader import get_verified_knowledge
verified = get_verified_knowledge()  # {"variables": [...], "metrics": [...], "datasets": [...]}

Quality Assurance

Automatic Post-Run Validation

Tier 1 tools fire registered validators automatically (no agent action needed). Currently active wiring:

  • extract_hydrological_signaturescheck_water_balance_consistency
  • fetch_streamflow_datacheck_unit_consistency (expected: m³/s)

Validators never raise — failures appear in quality_flags without crashing the tool. Register additional validators via:

from ai_hydro.mcp.enforcement import register_post_validator
register_post_validator("my_tool", my_validator_fn, lambda sid: {"session_id": sid})

aihydro-bench

A deterministic fixture benchmark suite (bench/tasks.yaml, ~26 tasks) verifies every core computation path without live network calls. Tasks span all tiers:

  • Group A–C: validators, compute functions, and their edge cases
  • Group D–G: ledger gates, knowledge registry guards, conflict resolution
  • Group H–J: enforcement layer, verified namespace, claim coupling (draft → add)

Run it:

pytest tests/test_bench.py -m bench -v       # fast fixture suite (no network)
pytest tests/test_bench.py -m bench_live -v  # live USGS calls (nightly CI only)

Extending with Plugins

aihydro-tools is a platform, not a closed product. Any researcher can package domain knowledge as a plugin and make it immediately available to every AI agent that uses AI-Hydro — flood frequency analysis, sediment transport, groundwater modelling, remote sensing workflows, or anything else the core doesn't yet cover.

Entry-point plugins load into the same process with full access to HydroSession and cached data:

# In your package's pyproject.toml
[project.entry-points."aihydro.tools"]
my_tool = "my_package.tools:my_tool_function"

Install the package, restart the server, and the tool is automatically discovered — no changes to the core required.

Standalone MCP servers let you build fully independent toolkits with their own dependencies, registered alongside the core ai-hydro server.

See the Plugin Guide for complete walkthroughs of both paths, the data contract, and session integration.


Use as a Python Library

You don't need an AI agent to benefit from aihydro-tools. Every tool is a regular Python function — import and call directly in scripts, notebooks, or pipelines:

from ai_hydro.analysis.watershed import delineate_watershed
from ai_hydro.data.streamflow import fetch_streamflow_data
from ai_hydro.analysis.signatures import extract_hydrological_signatures

# Delineate a watershed
ws = delineate_watershed("01031500")
print(f"Watershed area: {ws.data['area_km2']} km2")

# Fetch streamflow
sf = fetch_streamflow_data("01031500", start_date="2015-01-01", end_date="2024-12-31")
print(f"Records: {len(sf.data['dates'])} days")

# Extract signatures
sigs = extract_hydrological_signatures("01031500")
print(f"Baseflow index: {sigs.data['baseflow_index']}")

All functions return HydroResult objects with .data (dict) and .meta (provenance metadata).


Installation Details

Extras

Install only what you need:

Extra What it adds Install command
[data] Streamflow, forcing, land cover, soil, CAMELS retrieval pip install aihydro-tools[data]
[analysis] Watershed, signatures, TWI, geomorphic, curve number pip install aihydro-tools[analysis]
[modelling] PyTorch differentiable HBV-light, NeuralHydrology LSTM pip install aihydro-tools[modelling]
[viz] Matplotlib, Plotly, Folium visualisations pip install aihydro-tools[viz]
[all] Everything above pip install aihydro-tools[all]

PATH Troubleshooting

If aihydro-mcp is not found after install, pip placed it outside your PATH:

OS Typical location
Windows (user) %APPDATA%\Python\Python3XX\Scripts\aihydro-mcp.exe
Windows (system) C:\Python3XX\Scripts\aihydro-mcp.exe
macOS/Linux (user) ~/.local/bin/aihydro-mcp
macOS/Linux (system) /usr/local/bin/aihydro-mcp
Conda ~/miniconda3/bin/aihydro-mcp or ~/anaconda3/bin/aihydro-mcp

Universal fallback: python -m ai_hydro.mcp works regardless of PATH. The AI-Hydro extension auto-detects both the console script and the module fallback.


Extending with Plugins

AI-Hydro uses Python entry points for a clean plugin system. Community packages can contribute any of four capability layers without modifying the core:

Entry-point group Contributes Served by
aihydro.tools MCP tool functions list_available_tools()
aihydro.knowledge Library reference cards (JSON) get_library_reference()
aihydro.skills Workflow playbooks (SKILL.md) list_skills() / load_skill()
aihydro.clis CLI descriptor (binary + help) list_relevant_clis()

Example: add a custom tool

# my_hydro_pkg/tools.py
from ai_hydro.core.types import HydroResult, HydroMeta, DataSource

def compute_soil_moisture(session_id: str, workspace_dir: str = None) -> dict:
    """Estimate soil moisture from session forcing data."""
    # ... your computation ...
    result = HydroResult(data={...}, meta=HydroMeta(tool="compute_soil_moisture", ...))
    return result.to_dict()
# pyproject.toml
[project.entry-points."aihydro.tools"]
compute_soil_moisture = "my_hydro_pkg.tools:compute_soil_moisture"

After pip install my-hydro-pkg, the tool appears automatically in list_available_tools() on the next MCP server restart — no changes to aihydro-tools required.

Example: add a knowledge card

[project.entry-points."aihydro.knowledge"]
my_lib = "my_hydro_pkg.knowledge:get_refs_dir"

where get_refs_dir() returns a Path to a directory of *.json cards (same schema as the built-in cards in ai_hydro/knowledge/library_refs/).

Example: add a workflow skill

[project.entry-points."aihydro.skills"]
my_skills = "my_hydro_pkg.skills:get_skills_dir"

where get_skills_dir() returns a Path to a directory of *.md skill files (YAML frontmatter + markdown body). Skills appear in list_skills() and are loadable via load_skill(name).

Example: advertise a CLI

[project.entry-points."aihydro.clis"]
my_tool = "my_hydro_pkg.aihydro.cli_descriptor:descriptor"

where descriptor() returns {name, binary, description, help_subcommand}. The CLI appears in list_relevant_clis().

See the Plugin Guide for full walkthroughs.


Contributing

The most impactful contributions to AI-Hydro are new domain tools — knowledge that currently lives in papers and custom scripts, packaged so any AI agent can use it. High-priority areas include flood frequency analysis, sediment transport, groundwater modelling, remote sensing workflows (MODIS, Landsat, SAR), snow hydrology, and water quality.

You don't need to fork the core. Write a Python package, register an entry point, publish to PyPI. That's it.


Citation

If you use aihydro-tools in your research, please cite:

@software{aihydro_tools_2026,
  title   = {aihydro-tools: Python MCP Server for AI-Automated
             Hydrological Research},
  author  = {Galib, Mohammad and Merwade, Venkatesh},
  year    = {2026},
  version = {1.6.0},
  doi     = {10.5281/zenodo.19597589},
  url     = {https://doi.org/10.5281/zenodo.19597589}
}

For the VS Code extension, cite:

@software{aihydro_extension_2026,
  title   = {AI-Hydro: An Open Platform for End-to-End AI-Automated
             Hydrological Research (VS Code Extension)},
  author  = {Galib, Mohammad and Merwade, Venkatesh},
  year    = {2026},
  version = {0.1.3},
  doi     = {10.5281/zenodo.19597664},
  url     = {https://doi.org/10.5281/zenodo.19597664}
}

Links


License

Apache 2.0 © 2026 Mohammad Galib

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aihydro_tools-1.7.0.tar.gz (280.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aihydro_tools-1.7.0-py3-none-any.whl (286.7 kB view details)

Uploaded Python 3

File details

Details for the file aihydro_tools-1.7.0.tar.gz.

File metadata

  • Download URL: aihydro_tools-1.7.0.tar.gz
  • Upload date:
  • Size: 280.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for aihydro_tools-1.7.0.tar.gz
Algorithm Hash digest
SHA256 897b69644a7fce5db64006e46a26622999f11b83070b267f3d48db1373890fb2
MD5 7c57792badbc57fbd1c0324fd290ef3e
BLAKE2b-256 25d99cbb44288757b39551d51b6147221e792935756d3cdc3789bc035798d1d5

See more details on using hashes here.

File details

Details for the file aihydro_tools-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: aihydro_tools-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 286.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for aihydro_tools-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5412df71345fd254bdd5860ca5bff76acb4d77d382070d006280e6f2025062f0
MD5 d5b6871346c58253e3c7abbaf09924fe
BLAKE2b-256 265aa2d7e72937e492b8d299be7f1c6837023c8f410e8ad4e064d282e714a87c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page