Skip to main content

A Python package to prepare (download, extract, process input data) for GEOCIF and related models

Project description

geoprepare

image

A Python package to prepare (download, extract, process input data) for GEOCIF and related models

Installation

Install from PyPI

pip install --upgrade geoprepare

Install from GitHub (development)

pip install --upgrade --no-deps --force-reinstall git+https://github.com/ritviksahajpal/geoprepare.git

Local editable install

pip install -e ".[dev]"

CDS API (for AgERA5)

If you intend to download AgERA5 data, install the CDS API by following the instructions here.

MODIS data (octvi)

Install the octvi package to download MODIS data:

pip install git+https://github.com/ritviksahajpal/octvi.git

Downloading from the NASA DAACs requires a personal app key. After installation, run octviconfig in your command prompt. Information on obtaining app keys can be found here.

Pipeline

geoprepare follows a three-stage pipeline:

  1. Download (geodownload) - Download and preprocess global EO datasets to dir_download and dir_intermed
  2. Extract (geoextract) - Extract EO variable statistics per admin region to dir_output
  3. Merge (geomerge) - Merge extracted EO files into per-country/crop CSV files for ML models and AgMet graphics

All datasets store files in year-specific subfolders (e.g., dir_intermed/cpc_tmax/2024/, dir_download/nsidc/2025/).

Additional utilities:

  • Move (geomove) - One-time migration of existing flat directories to year-specific subfolders
  • Check (geocheck) - Validate that expected TIF files exist in dir_intermed after download
  • Diagnostics (diagnostics) - Count and summarize files in the data directories

Usage

config_dir = "/path/to/config"  # full path to your config directory

cfg_geoprepare = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geoextract.txt"]

1. Download data (geodownload)

Downloads and preprocesses global EO datasets. Only requires geobase.txt. The [DATASETS] section controls which datasets are downloaded. Each dataset is processed to global 0.05° TIF files in dir_intermed.

from geoprepare import geodownload
geodownload.run([f"{config_dir}/geobase.txt"])

2. Migrate to year subfolders (geomove)

Moves existing files from flat directories into year-specific subfolders. Run this once after upgrading to a version with year-subfolder support. All datasets are handled: CPC, ESI, NDVI, NSIDC, CHIRPS-GEFS, LST, Soil Moisture, AgERA5, VHI, FPAR, and AEF.

from geoprepare import geomove

# Preview what would be moved (no files are changed)
geomove.run([f"{config_dir}/geobase.txt"], dry_run=True)

# Execute the migration
geomove.run([f"{config_dir}/geobase.txt"])

3. Validate downloads (geocheck)

Checks that all expected TIF files exist in dir_intermed and are non-empty. Writes a timestamped report to dir_logs/check/.

from geoprepare import geocheck
geocheck.run([f"{config_dir}/geobase.txt"])

4. Extract crop masks and EO data (geoextract)

Extracts EO variable statistics (mean, median, etc.) for each admin region, crop, and growing season.

from geoprepare import geoextract
geoextract.run(cfg_geoprepare)

5. Merge extracted data (geomerge)

Merges per-region/year EO CSV files into a single CSV per country-crop-season combination.

from geoprepare import geomerge
geomerge.run(cfg_geoprepare)

Config files

File Purpose Used by
geobase.txt Paths, dataset settings, boundary file column mappings, logging both
countries.txt Per-country config (boundary files, admin levels, seasons, crops) both
crops.txt Crop masks, calendar category settings (EWCM, AMIS) both
geoextract.txt Extraction-only settings (method, threshold, parallelism) geoprepare
geocif.txt Indices/ML/agmet settings, country overrides, runtime selections geocif

Order matters: Config files are loaded left-to-right. When the same key appears in multiple files, the last file wins. The tool-specific file (geoextract.txt or geocif.txt) must be last so its [DEFAULT] values (countries, method, etc.) override the shared defaults in countries.txt.

config_dir = "/path/to/config"

cfg_geoprepare = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geoextract.txt"]
cfg_geocif = [f"{config_dir}/geobase.txt", f"{config_dir}/countries.txt", f"{config_dir}/crops.txt", f"{config_dir}/geocif.txt"]

geobase.txt

Shared paths, dataset settings, boundary file column mappings, and logging. Key sections:

  • [DATASETS] — Which datasets to download (e.g. ['CHIRPS', 'CPC', 'NDVI', 'ESI', 'NSIDC'])
  • [PATHS] — All directory paths, derived from dir_base
  • Per-dataset sections ([CHIRPS], [CPC], [FLDAS], etc.) — Dataset-specific settings like data URLs, variables, fill values
  • Boundary file sections ([adm_shapefile], [gaul1_asap_v04], etc.) — Column mappings from shapefile fields to standard names (ADM0_NAME, ADM1_NAME, ADM_ID)
  • [DEFAULT] — Shared defaults: start_year, end_year, parallel_process, fraction_cpus

countries.txt

Per-country configuration. Each country section specifies boundary file, admin level, seasons, crops, and EO variables. Countries are grouped by calendar category:

  • AMIS countries — Inherit defaults, override crops as needed
  • EWCM countries — Set category = EWCM, use_cropland_mask = True, custom calendar_file and boundary_file
  • [DEFAULT] — Shared defaults including eo_model (list of EO variables to extract)

crops.txt

Crop mask filenames (e.g. [maize] mask = Percent_Maize.tif) and calendar category settings ([EWCM], [AMIS]).

geoextract.txt

Extraction settings for geoprepare. [DEFAULT] section sets method, redo, threshold, floor/ceil, parallel_extract, countries, and forecast_seasons.

geocif.txt

ML and agmet settings for geocif. Contains [AGMET] plotting config, per-country crop overrides, ML model definitions, and [ML] hyperparameters.

Supported datasets

Dataset Description Source
AEF AlphaEarth Foundations satellite embeddings (64-band, 10m) source.coop
AGERA5 Agrometeorological indicators (precipitation, temperature) CDS
AVHRR Long-term NDVI NOAA NCEI
CHIRPS Rainfall estimates (v2 and v3) CHC
CHIRPS-GEFS 15-day precipitation forecasts CHC
CPC Temperature (Tmax, Tmin) and precipitation NOAA CPC
ESI Evaporative Stress Index (4-week, 12-week) SERVIR
FLDAS Land surface model outputs (soil moisture, precip, temp) NASA
FPAR Fraction of Absorbed Photosynthetically Active Radiation JRC
LST Land Surface Temperature (MODIS MOD11C1) NASA
NDVI Vegetation index from MODIS (MOD09CMG) NASA
NSIDC SMAP L4 soil moisture (surface, rootzone) NASA NSIDC
SOIL-MOISTURE NASA-USDA soil moisture (surface as1, subsurface as2) NASA
VHI Vegetation Health Index NOAA STAR
VIIRS Vegetation index from VIIRS (VNP09CMG) NASA

Directory layout

All datasets organize files into year-specific subfolders. After running geomove (or on fresh downloads), the directory structure looks like:

dir_download/
  nsidc/2025/*.h5, nsidc/2026/*.h5
  chirps_gefs/2026/*.tif
  fpar/2024/*.tif, fpar/2025/*.tif
  modis_lst/*.hdf                     (flat - pymodis manages this)
  ...

dir_intermed/
  cpc_tmax/2024/*.tif, cpc_tmax/2025/*.tif
  cpc_tmin/2024/*.tif, ...
  cpc_precip/2024/*.tif, ...
  chirps/v3/global/2024/*.tif, ...    (CHIRPS already used year subfolders)
  chirps_gefs/2026/*.tif
  esi_4wk/2024/*.tif, ...
  esi_12wk/2024/*.tif, ...
  ndvi/2024/*.tif, ...
  lst/2024/*.tif, ...
  nsidc/subdaily/2025/*.tif
  nsidc/daily/surface/2025/*.tif
  nsidc/daily/rootzone/2025/*.tif
  soil_moisture_as1/2024/*.tif, ...
  soil_moisture_as2/2024/*.tif, ...
  agera5/tif/{variable}/2024/*.tif, ...
  vhi/global/2024/*.tif, ...
  aef/{country}/2018/*.tif, ..., aef/{country}/aef_avg_global.tif
  fldas/.../2024/*.tif, ...           (FLDAS already used year subfolders)

Upload package to PyPI

# 1. Bump version
uvx bump2version patch --current-version X.X.X --new-version X.X.Y pyproject.toml geoprepare/__init__.py

# 2. Clean, build, upload
rm -rf dist/ build/ *.egg-info/
uv build
uvx twine upload dist/geoprepare-X.X.Y*

Credits

This project was supported by NASA Applied Sciences Grant No. 80NSSC17K0625 through the NASA Harvest Consortium, and the NASA Acres Consortium under NASA Grant #80NSSC23M0034.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoprepare-0.6.163.tar.gz (14.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoprepare-0.6.163-py3-none-any.whl (14.8 MB view details)

Uploaded Python 3

File details

Details for the file geoprepare-0.6.163.tar.gz.

File metadata

  • Download URL: geoprepare-0.6.163.tar.gz
  • Upload date:
  • Size: 14.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geoprepare-0.6.163.tar.gz
Algorithm Hash digest
SHA256 31a6531d26916f182028308fd168859c92344192753a4f7d155fa56f21252576
MD5 4a2784c2371462e5c2af5ebecab8bb00
BLAKE2b-256 fabe40e87fba02cdca65102237eafe0cfbc248a028fd0ef4d37b47aa20f61da3

See more details on using hashes here.

File details

Details for the file geoprepare-0.6.163-py3-none-any.whl.

File metadata

  • Download URL: geoprepare-0.6.163-py3-none-any.whl
  • Upload date:
  • Size: 14.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for geoprepare-0.6.163-py3-none-any.whl
Algorithm Hash digest
SHA256 d3630cfa0b82a2b45c1144860eea9f91599eda102ef009bf7379c8c3fe6e4179
MD5 de8df30ec066993320c2567bff9efe76
BLAKE2b-256 a40d112b987bcba343547a6a6194ef282b402fec22ca683c96c95f3f195fce68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page