Skip to main content

TerraFlow: a reproducible workflow for geospatial agricultural modeling.

Project description

TerraFlow: Reproducible Geospatial Agricultural Modeling

CI Deploy Docs Publish to PyPI Build JOSS Manuscript PyPI Python Version Quality gate Codecov License: MIT

TerraFlow v0.2.0 is a reproducible, open-source geospatial workflow framework for agricultural modeling. It provides:

  • Geospatial preprocessing (rasters, vectors, ROI clipping)
  • Spatially-aware climate data (per-cell spatial interpolation with fallback strategies) - NEW in v0.2.0
  • Config-driven model execution with Pydantic v2 validation
  • Python package with CLI interface (terraflow --config <file>)
  • Docker workflow support
  • JOSS-compatible research workflow and manuscript
  • Comprehensive test suite (127 tests) with 100% pass rate
  • Interactive Jupyter notebook for testing and visualization
  • Architecture Decision Records (ADRs) for design documentation

Use TerraFlow to build, test, and publish reproducible agricultural analytics pipelines.

Features

Core Capabilities:

  • Modern Python package (pyproject.toml, PEP 621 compliant)
  • Fully uv-installable (uv pip install terraflow-agro)
  • Reproducible CLI interface (terraflow --config <file>)
  • Pydantic v2 configuration models with geographic coordinate validation - enhanced in v0.2.0
  • Spatial interpolation using scipy.interpolate.griddata - new in v0.2.0
  • Extensible workflow architecture with clean separation of concerns

Development & Testing:

  • Comprehensive test suite with pytest (127 tests across 10 test files)
  • Linting with ruff and black
  • Makefile automation for dev/test/build/release workflows
  • Interactive Jupyter notebook for comprehensive testing
  • Example data and demo configurations

CI/CD & Documentation:

  • GitHub Actions for CI testing and linting
  • Automated PyPI publishing on version tags
  • MkDocs-based documentation with GitHub Pages deployment
  • JOSS manuscript build automation
  • Docker support for containerized workflows

Architecture & Design:

  • Architecture Decision Records (ADRs) documenting key design choices
  • Clean module separation (cli, config, climate, geo, ingest, model, pipeline, stats, viz)
  • Comprehensive error handling and resource management
  • Production-ready code quality

Installation

Option 1: Install from PyPI (Recommended)

uv pip install terraflow-agro

Verify installation:

import terraflow
print(terraflow.__version__)

Option 2: Install from source

Clone the repo:

git clone https://github.com/gmarupilla/AgroTerraFlow.git
cd AgroTerraFlow

Create .venv and install dependencies

make dev

This runs:

  • uv venv .venv
  • uv pip install --python .venv/bin/python -e ".[dev]" (Using only pyproject.toml — no requirements.txt)

Quickstart

Run the demo pipeline

make run-demo

which is equivalent to:

terraflow --config examples/demo_config.yml

CLI Usage

After pip install terraflow-agro, TerraFlow exposes a terraflow command:

terraflow --config config.yml

Relative paths inside the config file resolve relative to the config file's own directory, so configs are portable regardless of your working directory.

Example:

terraflow --config examples/demo_config.yml

Your results will appear in:

outputs/

Run Fingerprint

Each pipeline execution is identified by a deterministic run_fingerprint derived from:

  • Canonicalized YAML configuration
  • ROI geometry hash
  • Input file fingerprints (sha256, size, mtime)

Identical inputs always produce the same fingerprint across machines. This enables immutable run directories like:

runs/<fingerprint>/...

Climate Data Integration (v0.2.0)

TerraFlow now supports per-cell climate data with two interpolation strategies:

Spatial Interpolation (Recommended)

For climate data with geographic coordinates (weather stations, satellite grids):

climate:
  strategy: spatial          # Interpolate using scipy.griddata
  fallback_to_mean: true     # Use global mean for extrapolated cells

Benefits:

  • Works with arbitrary observation locations
  • Smooth spatial gradients across your ROI
  • Graceful handling of sparse data

Index-Based Matching

For pre-aligned climate data (one row per cell):

climate:
  strategy: index            # Direct row-to-cell matching
  fallback_to_mean: true     # Use mean for mismatched counts

Climate CSV Format: Your climate CSV must have lat, lon, and climate variables:

lat,lon,mean_temp,total_rain
34.05,-118.24,22.5,250.0
34.10,-118.19,23.1,260.0

See Climate Configuration and ADR-003 for details.

Documentation

Local preview

Install the docs dependencies and serve the site:

uv pip install -r docs/requirements.txt
mkdocs serve

Publishing

Documentation is built and published automatically via GitHub Pages on every push to main.

Development

Create virtual environment + install dev deps

make dev

Run tests

make test

Run the demo workflow

make run-demo

Linting

make lint

This runs ruff and black for code formatting and style checks.

Testing

TerraFlow includes a comprehensive test suite with 127 tests covering all core functionality.

Run all tests

make test

Test Coverage

The test suite covers:

  • CLI argument parsing and error handling
  • Climate data loading and interpolation (spatial and index-based)
  • Configuration validation with Pydantic v2
  • Geospatial operations (ROI clipping, masking, band selection)
  • Data ingestion and preprocessing
  • Model execution
  • Pipeline integration
  • Statistical analysis
  • Visualization generation

Interactive Testing

Use the comprehensive Jupyter notebook for interactive testing and exploration:

jupyter notebook notebooks/terraflow_v0.2.0_comprehensive_test.ipynb

Docker Usage

Build image

make docker-build

Run container

make docker-run

Equivalent to:

docker run --rm \
    -v $(pwd):/app \
    terraflow:latest \
    --config examples/demo_config.yml

Continuous Integration (GitHub Actions)

CI Pipeline (ci.yml)

The main CI pipeline runs on every push and pull request to main/master:

  • Sets up Python 3.10 and uv package manager
  • Creates virtual environment and installs dependencies
  • Runs full test suite with pytest
  • Runs linting checks with ruff and black

Documentation Deployment (docs.yml)

Automatically builds and deploys documentation to GitHub Pages on every push to main:

  • Builds MkDocs site with strict mode
  • Deploys to GitHub Pages

PyPI Publishing (publish-pypi.yml)

Triggered on version tags (v*..):

  • Builds Python wheel and source distribution
  • Publishes to PyPI automatically
  • No manual intervention required

JOSS Manuscript (manuscript.yml)

Builds the JOSS paper PDF on version tags or manual trigger:

  • Generates publication-ready manuscript
  • Uploads as GitHub artifact

Publishing a Release to PyPI

Publishing is fully automated via GitHub Actions and publish-pypi.yml.

1. Update version

make release version=0.1.X

This:

  • updates pyproject.toml
  • updates terraflow/__init__.py
  • commits version bump
  • tags release
  • pushes tag → triggers PyPI publish

2. GitHub Action builds & uploads:

  • wheel (.whl)
  • source distribution (.tar.gz)

No manual PyPI login required.

Configuration (Pydantic v2)

TerraFlow uses Pydantic v2 for typed config:

from pydantic import BaseModel

class WorkflowConfig(BaseModel):
    input_raster: str
    roi_path: str
    climate_source: str
    output_dir: str = "outputs"

    model_config = {
        "extra": "forbid",
        "validate_default": True
    }

A typical YAML config:

input_raster: "examples/sample_data/soil.tif"
roi_path: "examples/sample_data/roi.geojson"
climate_source: "era5"
output_dir: "outputs"

Architecture

TerraFlow follows clean architecture principles with clear separation of concerns:

Core Modules

  • cli.py: Command-line interface with argument parsing and error handling
  • config.py: Pydantic v2 models for configuration validation
  • climate.py: Climate data interpolation with spatial and index-based strategies
  • geo.py: Geospatial operations (raster I/O, ROI clipping, coordinate validation)
  • ingest.py: Data ingestion and preprocessing
  • model.py: Core modeling logic
  • pipeline.py: Workflow orchestration and execution
  • stats.py: Statistical analysis and aggregation
  • viz.py: Visualization generation with Plotly
  • utils.py: Utility functions and helpers

Architecture Decision Records

Key design decisions are documented in ADRs:

  • ADR-001: Band selection strategy for multi-band rasters
  • ADR-002: Bounding box vs polygon ROI support
  • ADR-003: Climate interpolation strategies (spatial vs index-based)

See docs/architecture/ for detailed ADRs.

Roadmap

See docs/ROADMAP.md for detailed feature planning.

Planned enhancements:

  • Multiple crop models support
  • Calibration and uncertainty quantification modules
  • Enhanced geospatial visualization
  • Improved CLI templates and pipeline configurability
  • Performance optimization for large-scale rasters
  • Additional interpolation methods

Contributing

Contributions are welcome! See docs/contributing.md for guidelines.

Citation

If you use TerraFlow in your research, please cite our JOSS paper (manuscript in preparation).

License

MIT License — free for academic, commercial, and open-source use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

terraflow_agro-0.2.1.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

terraflow_agro-0.2.1-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file terraflow_agro-0.2.1.tar.gz.

File metadata

  • Download URL: terraflow_agro-0.2.1.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for terraflow_agro-0.2.1.tar.gz
Algorithm Hash digest
SHA256 eacf6ef9cf42fa48034010cf7ccdd8ef15c0ca9071dc8523b7e7567246dbdf99
MD5 4bf62f87059bc489caa09d2290b0831f
BLAKE2b-256 17329b97229c1b37ca2616f61911362de23b749cbcc7cb639d969de648d2cd11

See more details on using hashes here.

File details

Details for the file terraflow_agro-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: terraflow_agro-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for terraflow_agro-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f290f41418d507b4fcdebca97eeee5333b521861cdf1ed4d0b2b021ec3d81de5
MD5 6259469dec89eb7f5037e5b6ca742317
BLAKE2b-256 93dd683eee8803634125fbafd63c49e3c7017534b7deb8a0e55762c18c97f176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page