Skip to main content

A Python package to prepare (download, extract, process input data) for GEOCIF and related models

Project description

geoprepare

image

A Python package to prepare (download, extract, process input data) for GEOCIF and related models

Installation

Install from PyPI

pip install --upgrade geoprepare

Install from GitHub (development)

pip install --upgrade --no-deps --force-reinstall git+https://github.com/ritviksahajpal/geoprepare.git

Local editable install

pip install -e ".[dev]"

CDS API (for AgERA5)

If you intend to download AgERA5 data, install the CDS API by following the instructions here.

MODIS data (octvi)

Install the octvi package to download MODIS data:

pip install git+https://github.com/ritviksahajpal/octvi.git

Downloading from the NASA DAACs requires a personal app key. After installation, run octviconfig in your command prompt. Information on obtaining app keys can be found here.

Pipeline

geoprepare follows a three-stage pipeline:

  1. Download (geodownload) - Download and preprocess global EO datasets to dir_download and dir_intermed
  2. Extract (geoextract) - Extract EO variable statistics per admin region to dir_output
  3. Merge (geomerge) - Merge extracted EO files into per-country/crop CSV files for ML models and AgMet graphics

All datasets store files in year-specific subfolders (e.g., dir_intermed/cpc_tmax/2024/, dir_download/nsidc/2025/).

Additional utilities:

  • Move (geomove) - One-time migration of existing flat directories to year-specific subfolders
  • Check (geocheck) - Validate that expected TIF files exist in dir_intermed after download
  • Diagnostics (diagnostics) - Count and summarize files in the data directories

Usage

config_dir = "/path/to/config"  # full path to your config directory

cfg_geoprepare = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geoextract.txt"]

1. Download data (geodownload)

Downloads and preprocesses global EO datasets. Only requires geobase.txt. The [DATASETS] section controls which datasets are downloaded. Each dataset is processed to global 0.05° TIF files in dir_intermed.

from geoprepare import geodownload
geodownload.run([f"{config_dir}/geobase.txt"])

2. Migrate to year subfolders (geomove)

Moves existing files from flat directories into year-specific subfolders. Run this once after upgrading to a version with year-subfolder support. All datasets are handled: CPC, ESI, NDVI, NSIDC, CHIRPS-GEFS, LST, Soil Moisture, AgERA5, VHI, FPAR, and AEF.

from geoprepare import geomove

# Preview what would be moved (no files are changed)
geomove.run([f"{config_dir}/geobase.txt"], dry_run=True)

# Execute the migration
geomove.run([f"{config_dir}/geobase.txt"])

3. Validate downloads (geocheck)

Checks that all expected TIF files exist in dir_intermed and are non-empty. Writes a timestamped report to dir_logs/check/.

from geoprepare import geocheck
geocheck.run([f"{config_dir}/geobase.txt"])

4. Extract crop masks and EO data (geoextract)

Extracts EO variable statistics (mean, median, etc.) for each admin region, crop, and growing season.

from geoprepare import geoextract
geoextract.run(cfg_geoprepare)

5. Merge extracted data (geomerge)

Merges per-region/year EO CSV files into a single CSV per country-crop-season combination.

from geoprepare import geomerge
geomerge.run(cfg_geoprepare)

Config files

File Purpose Used by
geobase.txt Paths, dataset settings, boundary file column mappings, logging both
countries.txt Per-country config (boundary files, admin levels, seasons, crops) both
crops.txt Crop masks, calendar category settings (EWCM, AMIS) both
geoextract.txt Extraction-only settings (method, threshold, parallelism) geoprepare
geocif.txt Indices/ML/agmet settings, country overrides, runtime selections geocif

Order matters: Config files are loaded left-to-right. When the same key appears in multiple files, the last file wins. The tool-specific file (geoextract.txt or geocif.txt) must be last so its [DEFAULT] values (countries, method, etc.) override the shared defaults in countries.txt.

config_dir = "/path/to/config"

cfg_geoprepare = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geoextract.txt"]
cfg_geocif = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geocif.txt"]

geobase.txt

Shared paths, dataset settings, boundary file column mappings, and logging. Key sections:

  • [DATASETS] — Which datasets to download (e.g. ['CHIRPS', 'CPC', 'NDVI', 'ESI', 'NSIDC'])
  • [PATHS] — All directory paths, derived from dir_base
  • Per-dataset sections ([CHIRPS], [CPC], [FLDAS], etc.) — Dataset-specific settings like data URLs, variables, fill values
  • Boundary file sections ([adm_shapefile], [gaul1_asap_v04], etc.) — Column mappings from shapefile fields to standard names (ADM0_NAME, ADM1_NAME, ADM_ID)
  • [DEFAULT] — Shared defaults: start_year, end_year, parallel_process, fraction_cpus

countries.txt

Per-country configuration. Each country section specifies boundary file, admin level, seasons, crops, and EO variables. Countries are grouped by calendar category:

  • AMIS countries — Inherit defaults, override crops as needed
  • EWCM countries — Set category = EWCM, use_cropland_mask = True, custom calendar_file and boundary_file
  • [DEFAULT] — Shared defaults including eo_model (list of EO variables to extract)

crops.txt

Crop mask filenames (e.g. [maize] mask = Percent_Maize.tif) and calendar category settings ([EWCM], [AMIS]).

geoextract.txt

Extraction settings for geoprepare. [DEFAULT] section sets method, redo, threshold, floor/ceil, parallel_extract, countries, and forecast_seasons.

geocif.txt

ML and agmet settings for geocif. Contains [AGMET] plotting config, per-country crop overrides, ML model definitions, and [ML] hyperparameters.

Supported datasets

Dataset Description Source
AEF AlphaEarth Foundations satellite embeddings (64-band, 10m) source.coop
AGERA5 Agrometeorological indicators (precipitation, temperature) CDS
AVHRR Long-term NDVI NOAA NCEI
CHIRPS Rainfall estimates (v2 and v3) CHC
CHIRPS-GEFS 15-day precipitation forecasts CHC
CPC Temperature (Tmax, Tmin) and precipitation NOAA CPC
ESI Evaporative Stress Index (4-week, 12-week) SERVIR
FLDAS Land surface model outputs (soil moisture, precip, temp) NASA
FPAR Fraction of Absorbed Photosynthetically Active Radiation JRC
LST Land Surface Temperature (MODIS MOD11C1) NASA
NDVI Vegetation index from MODIS (MOD09CMG) NASA
NSIDC SMAP L4 soil moisture (surface, rootzone) NASA NSIDC
SOIL-MOISTURE NASA-USDA soil moisture (surface as1, subsurface as2) NASA
VHI Vegetation Health Index NOAA STAR
VIIRS Vegetation index from VIIRS (VNP09CMG) NASA

Directory layout

All datasets organize files into year-specific subfolders. After running geomove (or on fresh downloads), the directory structure looks like:

dir_download/
  nsidc/2025/*.h5, nsidc/2026/*.h5
  chirps_gefs/2026/*.tif
  fpar/2024/*.tif, fpar/2025/*.tif
  modis_lst/*.hdf                     (flat - pymodis manages this)
  ...

dir_intermed/
  cpc_tmax/2024/*.tif, cpc_tmax/2025/*.tif
  cpc_tmin/2024/*.tif, ...
  cpc_precip/2024/*.tif, ...
  chirps/v3/global/2024/*.tif, ...    (CHIRPS already used year subfolders)
  chirps_gefs/2026/*.tif
  esi_4wk/2024/*.tif, ...
  esi_12wk/2024/*.tif, ...
  ndvi/2024/*.tif, ...
  lst/2024/*.tif, ...
  nsidc/subdaily/2025/*.tif
  nsidc/daily/surface/2025/*.tif
  nsidc/daily/rootzone/2025/*.tif
  soil_moisture_as1/2024/*.tif, ...
  soil_moisture_as2/2024/*.tif, ...
  agera5/tif/{variable}/2024/*.tif, ...
  vhi/global/2024/*.tif, ...
  aef/{country}/2018/*.tif, ..., aef/{country}/aef_avg_global.tif
  fldas/.../2024/*.tif, ...           (FLDAS already used year subfolders)

Upload package to PyPI

# 1. Bump version
uvx bump2version patch --current-version X.X.X --new-version X.X.Y pyproject.toml geoprepare/__init__.py

# 2. Clean, build, upload
rm -rf dist/ build/ *.egg-info/
uv build
uvx twine upload dist/geoprepare-X.X.Y*

Credits

This project was supported by NASA Applied Sciences Grant No. 80NSSC17K0625 through the NASA Harvest Consortium, and the NASA Acres Consortium under NASA Grant #80NSSC23M0034.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoprepare-0.6.174.tar.gz (14.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoprepare-0.6.174-py3-none-any.whl (14.8 MB view details)

Uploaded Python 3

File details

Details for the file geoprepare-0.6.174.tar.gz.

File metadata

  • Download URL: geoprepare-0.6.174.tar.gz
  • Upload date:
  • Size: 14.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geoprepare-0.6.174.tar.gz
Algorithm Hash digest
SHA256 fbc4939605bb86b6a6db7e4e7af1c353c388e7ee3a1ea7c1e2422e9586f3cff5
MD5 d3392299adec3d5bdbb319577812aaa0
BLAKE2b-256 63eff2faf94b147f694759e6e50e1564212a6923ead774c9f1926db69935512b

See more details on using hashes here.

File details

Details for the file geoprepare-0.6.174-py3-none-any.whl.

File metadata

  • Download URL: geoprepare-0.6.174-py3-none-any.whl
  • Upload date:
  • Size: 14.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geoprepare-0.6.174-py3-none-any.whl
Algorithm Hash digest
SHA256 daea13b52b8e799ba1eb5ccc7653c460224c4099cfcdba2f84893eace3a9c86a
MD5 dca9ce02742517ba620ce984f9217bdf
BLAKE2b-256 8fb6782f24d33be8379131e046b1b868dcf1465ec20e871a6183050da7dcf2e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page