Skip to main content

Models to visualize and forecast crop conditions and yields

Project description

geocif

image

Models to visualize and forecast crop conditions and yields

Generate Climatic Impact-Drivers (CIDs) from Earth Observation (EO) data, build ML yield forecasting models, and produce agmet condition monitoring plots.

Climatic Impact-Drivers for Crop Yield Assessment at NASA Harvest

Setup

Requirements

  • Python 3.11+
  • uv

Install

cd geocif                   # project root (where pyproject.toml lives)
uv sync                     # creates .venv and installs all dependencies

On Windows, uv automatically pulls pre-built geospatial wheels (GDAL, rasterio, fiona, shapely, pyproj, rtree) from the URLs in [tool.uv.sources]. On Linux/macOS, those entries are skipped (platform marker) and packages are installed from PyPI.

To activate the environment:

# Windows
.venv\Scripts\activate

# Linux/macOS
source .venv/bin/activate

Fresh reinstall

rm -rf .venv && uv sync

Config files

File Purpose Used by
geobase.txt Paths, shapefile column mappings both
countries.txt Per-country config (boundary files, admin levels, seasons, crops) both
crops.txt Crop masks, calendar categories (EWCM, AMIS) both
geoextract.txt Extraction-only settings (method, threshold, parallelism) geoprepare
geocif.txt Indices/ML/agmet settings, country overrides, runtime selections geocif

Usage

Order matters: Config files are loaded left-to-right. When the same key appears in multiple files, the last file wins. The tool-specific file (geoextract.txt or geocif.txt) must be last so its [DEFAULT] values (countries, method, etc.) override the shared defaults in countries.txt.

config_dir = "/path/to/config"  # full path to your config directory

cfg_geoprepare = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geoextract.txt"]
cfg_geocif = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geocif.txt"]

geoprepare (download, extract, merge)

from geoprepare import geodownload
geodownload.run([f"{config_dir}/geobase.txt"])

from geoprepare import geoextract
geoextract.run(cfg_geoprepare)

from geoprepare import geomerge
geomerge.run(cfg_geoprepare)

geocif (indices, ML, agmet, analysis, experiments)

from geocif import indices_runner
indices_runner.run(cfg_geocif)

from geocif import geocif_runner
geocif_runner.run(cfg_geocif)

from geocif.agmet import geoagmet
geoagmet.run(cfg_geocif)

from geocif import analysis
analysis.run(cfg_geocif)

from geocif import experiments
experiments.run(cfg_geocif, n_trials=30)

from geocif import yield_outlook
yield_outlook.run(cfg_geocif)  # uses config defaults (10 years, mean)
# yield_outlook.run(cfg_geocif, current_year=2026, n_years=10, aggregation="median")

ML models

geocif supports the following model types (configured via models in [DEFAULT]):

Model Key Type
CatBoost catboost Gradient boosting
XGBoost xgboost Gradient boosting
TabPFN tabpfn Prior-fitted network
TabICL tabicl In-context learning
NGBoost ngboost Natural gradient boosting
YDF ydf Yggdrasil decision forests
Oblique RF oblique Oblique random forest
Cubist cubist Rule-based regression
MERF merf Mixed effects random forest
Linear linear LassoCV / LogisticRegressionCV
GAM gam Generalized additive model
GeoSpaNN geospaNN Geospatial neural network
Median median Median baseline
Analog analog Analogous year baseline

Feature selection methods

Configured via feature_selection in [ML]:

none, SelectKBest, BorutaPy, Leshy, gOMP, RFECV, RFE, lasso, mrmr, SHAP, stabl, PowerShap, BorutaShap, Genetic, feature_engine, multi

Spatial neighbor features

Optional GraphSAGE-style preprocessing that computes yield-correlation-weighted averages of neighboring regions' features. Enabled via [ML]:

use_spatial_neighbors = True
spatial_neighbor_method = knn   ; knn or full
spatial_neighbor_k = 5          ; number of nearest neighbors

For each admin region, the neighbor graph is built from training data using haversine distances and Pearson yield correlations as edge weights. Neighbor-aggregated features are added as nbr_* columns and flow through standard feature selection.

Experiments output

The experiments runner writes to a dedicated DB and analysis folder under dir_output:

{dir_output}/
└── ml/
    ├── db/
    │   └── experiments_{MMMM_DD_YYYY_HH}H.db
    │
    └── analysis/
        └── {MMMM_DD_YYYY}/
            ├── experiments/                            # Experiment 0 (model comparison)
            │   ├── experiment_metrics.csv
            │   ├── heatmap_models.png
            │   ├── boxplot_models.png
            │   ├── regional_mape_models_{country}.png
            │   ├── error_distribution_models.png
            │   └── metric_comparison.png
            │
            └── optimization/                           # Optuna hyperparameter search
                ├── optuna_trials.csv
                ├── best_params.csv
                ├── convergence.png
                ├── optimization_history.png
                ├── param_importances.png
                └── parallel_coordinate.png

Outlook output

The yield outlook runner produces a diverging choropleth map showing current forecast yield as a percentage of the historical mean/median prediction per region, plus a combined CSV.

{dir_output}/
└── ml/
    └── analysis/
        └── {MMMM_DD_YYYY}/
            └── outlook/
                ├── yield_outlook_{country}_{crop}_{model}_{stage}_{year}.png
                └── yield_outlook_{year}.csv

Config file documentation

geobase.txt

Shared paths and dataset settings. All directory paths are derived from dir_base.

[PATHS]
dir_base = /gpfs/data1/cmongp1/GEO

dir_inputs = ${dir_base}/inputs
dir_logs = ${dir_base}/logs
dir_download = ${dir_inputs}/download
dir_intermed = ${dir_inputs}/intermed
dir_metadata = ${dir_inputs}/metadata
dir_condition = ${dir_inputs}/crop_condition
dir_crop_inputs = ${dir_condition}/crop_t20

dir_boundary_files = ${dir_metadata}/boundary_files
dir_crop_calendars = ${dir_metadata}/crop_calendars
dir_crop_masks = ${dir_metadata}/crop_masks
dir_images = ${dir_metadata}/images
dir_production_statistics = ${dir_metadata}/production_statistics

dir_output = ${dir_base}/outputs

[DATASETS]
datasets = ['CHIRPS', 'CPC', 'NDVI', 'ESI', 'NSIDC', 'AEF']

countries.txt

Single source of truth for per-country config. Shared by both geoprepare and geocif.

[DEFAULT]
boundary_file = gaul1_asap_v04.shp
admin_level = admin_1
seasons = [1]
crops = ['maize']
category = AMIS
use_cropland_mask = False
calendar_file = crop_calendar.csv

; AMIS countries (inherit from DEFAULT, override crops if needed)
[argentina]
crops = ['soybean', 'winter_wheat', 'maize']

; EWCM countries (full per-country config)
[kenya]
category = EWCM
admin_level = admin_1
seasons = [1, 2]
use_cropland_mask = True
boundary_file = adm_shapefile.gpkg
calendar_file = EWCM_2025-04-21.xlsx
crops = ['maize']

[malawi]
category = EWCM
admin_level = admin_2
use_cropland_mask = True
boundary_file = adm_shapefile.gpkg
calendar_file = EWCM_2025-04-21.xlsx
crops = ['maize']

crops.txt

Crop mask filenames and calendar category definitions.

; Crop masks
[maize]
mask = Percent_Maize.tif

[winter_wheat]
mask = Percent_Winter_Wheat.tif

[sorghum]
mask = cropland_v9.tif

; Calendar categories
[EWCM]
use_cropland_mask = True
calendar_file = EWCM_2026-01-05.xlsx
crops = ['maize', 'sorghum', 'millet', 'rice', 'winter_wheat', 'teff']
eo_model = ['aef', 'nsidc_surface', 'nsidc_rootzone', 'ndvi', 'cpc_tmax', 'cpc_tmin', 'chirps', 'chirps_gefs', 'esi_4wk']

[AMIS]
calendar_file = AMISCM_2026-01-05.xlsx

geoextract.txt

Extraction-only settings for geoprepare. Loaded last so its [DEFAULT] overrides shared defaults.

[DEFAULT]
method = JRC
redo = False
threshold = True
floor = 20
ceil = 90
countries = ["malawi"]
forecast_seasons = [2022]

[PROJECT]
parallel_extract = True
parallel_merge = False

geocif.txt

Indices, ML, and agmet settings for geocif. Country overrides go here when geocif needs different values than countries.txt (e.g., a subset of crops).

[AGMET]
eo_plot = ['ndvi', 'chirts_era5_tmax', 'chirts_era5_tmin', 'chirps', 'esi_4wk', 'nsidc_surface', 'nsidc_rootzone']
logo_harvest = harvest.png
logo_geoglam = geoglam.png

; Country overrides (only where geocif differs from countries.txt)
[ethiopia]
crops = ['winter_wheat']

[bangladesh]
crops = ['rice']
admin_level = admin_2
boundary_file = bangladesh.shp

; ML model definitions
[catboost]
ML_model = True

[analog]
ML_model = False

[ML]
model_type = REGRESSION
target = Yield (tn per ha)
feature_selection = gOMP
cluster_strategy = single
check_yield_trend = False
use_spatial_neighbors = True
spatial_neighbor_method = knn
spatial_neighbor_k = 5
lag_yield_as_feature = True
lag_years = 3
median_yield_as_feature = False
median_years = 5
include_lat_lon_as_feature = False
panel_model = True
cat_features = ["Harvest Year", "Region_ID", "Region"]
outlook_n_years = 10        ; Number of historical years for yield outlook comparison
outlook_aggregation = mean  ; mean or median

[LOGGING]
log_level = INFO

[DEFAULT]
data_source = harvest
method = monthly_r
project_name = geocif
countries = ["kenya"]
crops = ['maize']
admin_level = admin_1
models = ['catboost']
seasons = [1]
threshold = True
floor = 20

FLDAS forecast overlay

When FLDAS columns are present in the merged data (e.g. fldas_tair_tavg_lead0 through _lead5), agmet plots automatically overlay forecast dots on matching panels:

FLDAS variable Target panel
fldas_tair_tavg Temperature
fldas_totalprecip_tavg Daily precipitation
fldas_soilmoist_tavg Soil moisture (surface)

Each lead time (0–5) appears as a diamond marker with decreasing opacity (lead 0 = most opaque). Dots beyond the harvest date are suppressed. No config changes are needed — detection is automatic.

Credits

This project was supported by NASA Applied Sciences Grant No. 80NSSC17K0625 through the NASA Harvest Consortium, and the NASA Acres Consortium under NASA Grant #80NSSC23M0034.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geocif-0.4.364.tar.gz (236.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geocif-0.4.364-py2.py3-none-any.whl (252.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file geocif-0.4.364.tar.gz.

File metadata

  • Download URL: geocif-0.4.364.tar.gz
  • Upload date:
  • Size: 236.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geocif-0.4.364.tar.gz
Algorithm Hash digest
SHA256 f32d9755f853790fbbc424843940398bb5d781dbf59a7da1eaa5c7c786bfd422
MD5 cb99e98bbc41bca2fe69d69f32cbe9e6
BLAKE2b-256 89eaefcf984054e581c52106333af4ff6f7fe37461432072d4e4a1aaae14dcd5

See more details on using hashes here.

File details

Details for the file geocif-0.4.364-py2.py3-none-any.whl.

File metadata

  • Download URL: geocif-0.4.364-py2.py3-none-any.whl
  • Upload date:
  • Size: 252.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geocif-0.4.364-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2add9d3b98af1b8473b40ee5e4b132868d9a829c8e26539dba428faecc27f809
MD5 669ee903df4a08161d4d8897299c85a1
BLAKE2b-256 62f1145b89acbad9bf28a52b7a950f9f410d1b450c38367a3186f28e414824f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page