TerraFlow: a reproducible workflow for geospatial agricultural modeling.
Project description
TerraFlow: Reproducible Geospatial Agricultural Modeling
TerraFlow v0.2.0 is a reproducible, open-source geospatial workflow framework for agricultural modeling. It provides:
- Geospatial preprocessing (rasters, vectors, ROI clipping)
- Spatially-aware climate data (per-cell spatial interpolation with fallback strategies) - NEW in v0.2.0
- Config-driven model execution with Pydantic v2 validation
- Python package with CLI interface (
terraflow --config <file>) - Docker workflow support
- JOSS-compatible research workflow and manuscript
- Comprehensive test suite (127 tests) with 100% pass rate
- Interactive Jupyter notebook for testing and visualization
- Architecture Decision Records (ADRs) for design documentation
Use TerraFlow to build, test, and publish reproducible agricultural analytics pipelines.
Features
Core Capabilities:
- Modern Python package (pyproject.toml, PEP 621 compliant)
- Fully uv-installable (
uv pip install terraflow-agro) - Reproducible CLI interface (
terraflow --config <file>) - Pydantic v2 configuration models with geographic coordinate validation - enhanced in v0.2.0
- Spatial interpolation using scipy.interpolate.griddata - new in v0.2.0
- Extensible workflow architecture with clean separation of concerns
Development & Testing:
- Comprehensive test suite with pytest (127 tests across 10 test files)
- Linting with ruff and black
- Makefile automation for dev/test/build/release workflows
- Interactive Jupyter notebook for comprehensive testing
- Example data and demo configurations
CI/CD & Documentation:
- GitHub Actions for CI testing and linting
- Automated PyPI publishing on version tags
- MkDocs-based documentation with GitHub Pages deployment
- JOSS manuscript build automation
- Docker support for containerized workflows
Architecture & Design:
- Architecture Decision Records (ADRs) documenting key design choices
- Clean module separation (cli, config, climate, geo, ingest, model, pipeline, stats, viz)
- Comprehensive error handling and resource management
- Production-ready code quality
Installation
Option 1: Install from PyPI (Recommended)
uv pip install terraflow-agro
Verify installation:
import terraflow
print(terraflow.__version__)
Option 2: Install from source
Clone the repo:
git clone https://github.com/gmarupilla/AgroTerraFlow.git
cd AgroTerraFlow
Create .venv and install dependencies
make dev
This runs:
uv venv .venvuv pip install --python .venv/bin/python -e ".[dev]"(Using onlypyproject.toml— no requirements.txt)
Quickstart
Run the demo pipeline
make run-demo
which is equivalent to:
terraflow --config examples/demo_config.yml
CLI Usage
After pip install terraflow-agro, TerraFlow exposes a terraflow command:
terraflow --config config.yml
Relative paths inside the config file resolve relative to the config file's own directory, so configs are portable regardless of your working directory.
Example:
terraflow --config examples/demo_config.yml
Your results will appear in:
outputs/
Run Fingerprint
Each pipeline execution is identified by a deterministic run_fingerprint derived from:
- Canonicalized YAML configuration
- ROI geometry hash
- Input file fingerprints (sha256, size, mtime)
Identical inputs always produce the same fingerprint across machines. This enables immutable run directories like:
runs/<fingerprint>/...
Climate Data Integration (v0.2.0)
TerraFlow now supports per-cell climate data with two interpolation strategies:
Spatial Interpolation (Recommended)
For climate data with geographic coordinates (weather stations, satellite grids):
climate:
strategy: spatial # Interpolate using scipy.griddata
fallback_to_mean: true # Use global mean for extrapolated cells
Benefits:
- Works with arbitrary observation locations
- Smooth spatial gradients across your ROI
- Graceful handling of sparse data
Index-Based Matching
For pre-aligned climate data (one row per cell):
climate:
strategy: index # Direct row-to-cell matching
fallback_to_mean: true # Use mean for mismatched counts
Climate CSV Format:
Your climate CSV must have lat, lon, and climate variables:
lat,lon,mean_temp,total_rain
34.05,-118.24,22.5,250.0
34.10,-118.19,23.1,260.0
See Climate Configuration and ADR-003 for details.
Documentation
Local preview
Install the docs dependencies and serve the site:
uv pip install -r docs/requirements.txt
mkdocs serve
Publishing
Documentation is built and published automatically via GitHub Pages on every push to main.
Development
Create virtual environment + install dev deps
make dev
Run tests
make test
Run the demo workflow
make run-demo
Linting
make lint
This runs ruff and black for code formatting and style checks.
Testing
TerraFlow includes a comprehensive test suite with 127 tests covering all core functionality.
Run all tests
make test
Test Coverage
The test suite covers:
- CLI argument parsing and error handling
- Climate data loading and interpolation (spatial and index-based)
- Configuration validation with Pydantic v2
- Geospatial operations (ROI clipping, masking, band selection)
- Data ingestion and preprocessing
- Model execution
- Pipeline integration
- Statistical analysis
- Visualization generation
Interactive Testing
Use the comprehensive Jupyter notebook for interactive testing and exploration:
jupyter notebook notebooks/terraflow_v0.2.0_comprehensive_test.ipynb
Docker Usage
Build image
make docker-build
Run container
make docker-run
Equivalent to:
docker run --rm \
-v $(pwd):/app \
terraflow:latest \
--config examples/demo_config.yml
Continuous Integration (GitHub Actions)
CI Pipeline (ci.yml)
The main CI pipeline runs on every push and pull request to main/master:
- Sets up Python 3.10 and uv package manager
- Creates virtual environment and installs dependencies
- Runs full test suite with pytest
- Runs linting checks with ruff and black
Documentation Deployment (docs.yml)
Automatically builds and deploys documentation to GitHub Pages on every push to main:
- Builds MkDocs site with strict mode
- Deploys to GitHub Pages
PyPI Publishing (publish-pypi.yml)
Triggered on version tags (v*..):
- Builds Python wheel and source distribution
- Publishes to PyPI automatically
- No manual intervention required
JOSS Manuscript (manuscript.yml)
Builds the JOSS paper PDF on version tags or manual trigger:
- Generates publication-ready manuscript
- Uploads as GitHub artifact
Publishing a Release to PyPI
Publishing is fully automated via GitHub Actions and publish-pypi.yml.
1. Update version
make release version=0.1.X
This:
- updates
pyproject.toml - updates
terraflow/__init__.py - commits version bump
- tags release
- pushes tag → triggers PyPI publish
2. GitHub Action builds & uploads:
- wheel (
.whl) - source distribution (
.tar.gz)
No manual PyPI login required.
Configuration (Pydantic v2)
TerraFlow uses Pydantic v2 for typed config:
from pydantic import BaseModel
class WorkflowConfig(BaseModel):
input_raster: str
roi_path: str
climate_source: str
output_dir: str = "outputs"
model_config = {
"extra": "forbid",
"validate_default": True
}
A typical YAML config:
input_raster: "examples/sample_data/soil.tif"
roi_path: "examples/sample_data/roi.geojson"
climate_source: "era5"
output_dir: "outputs"
Architecture
TerraFlow follows clean architecture principles with clear separation of concerns:
Core Modules
- cli.py: Command-line interface with argument parsing and error handling
- config.py: Pydantic v2 models for configuration validation
- climate.py: Climate data interpolation with spatial and index-based strategies
- geo.py: Geospatial operations (raster I/O, ROI clipping, coordinate validation)
- ingest.py: Data ingestion and preprocessing
- model.py: Core modeling logic
- pipeline.py: Workflow orchestration and execution
- stats.py: Statistical analysis and aggregation
- viz.py: Visualization generation with Plotly
- utils.py: Utility functions and helpers
Architecture Decision Records
Key design decisions are documented in ADRs:
- ADR-001: Band selection strategy for multi-band rasters
- ADR-002: Bounding box vs polygon ROI support
- ADR-003: Climate interpolation strategies (spatial vs index-based)
See docs/architecture/ for detailed ADRs.
Roadmap
See docs/ROADMAP.md for detailed feature planning.
Planned enhancements:
- Multiple crop models support
- Calibration and uncertainty quantification modules
- Enhanced geospatial visualization
- Improved CLI templates and pipeline configurability
- Performance optimization for large-scale rasters
- Additional interpolation methods
Contributing
Contributions are welcome! See docs/contributing.md for guidelines.
Citation
If you use TerraFlow in your research, please cite our JOSS paper (manuscript in preparation).
License
MIT License — free for academic, commercial, and open-source use.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file terraflow_agro-0.2.1.tar.gz.
File metadata
- Download URL: terraflow_agro-0.2.1.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eacf6ef9cf42fa48034010cf7ccdd8ef15c0ca9071dc8523b7e7567246dbdf99
|
|
| MD5 |
4bf62f87059bc489caa09d2290b0831f
|
|
| BLAKE2b-256 |
17329b97229c1b37ca2616f61911362de23b749cbcc7cb639d969de648d2cd11
|
File details
Details for the file terraflow_agro-0.2.1-py3-none-any.whl.
File metadata
- Download URL: terraflow_agro-0.2.1-py3-none-any.whl
- Upload date:
- Size: 34.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f290f41418d507b4fcdebca97eeee5333b521861cdf1ed4d0b2b021ec3d81de5
|
|
| MD5 |
6259469dec89eb7f5037e5b6ca742317
|
|
| BLAKE2b-256 |
93dd683eee8803634125fbafd63c49e3c7017534b7deb8a0e55762c18c97f176
|