Automated feature extraction from environmental data sources for ecological and spatial analysis.
Project description
envoi
Automated feature extraction from environmental data sources for ecological and spatial analysis.
Table of contents
- Install
- Earth Engine setup
- Quick start
- Outputs
- Advanced usage
- Reference
- How to cite
- Contributors
- Project links
envoi - ENvironmental Variables for Observational Instances
Ecological and spatial models need environmental variables attached to field sample points — climate, terrain, land cover, vegetation indices. The usual workflow involves stitching together one-off scripts for each data source (Earth Engine for satellite data, rasterio for local files, ad-hoc projections to get distances right), and the outputs rarely line up.
envoi exposes a single extract(df, config) call that runs against both Google Earth Engine and local GeoTIFFs and returns the same shape of output. No pre-downloading, sensible defaults for users who'd rather not think about CRS or UTM zones, and the same reducers and QC columns across data sources so results are directly comparable.
envoi is developed at the Biodiversity Data Lab at Uppsala University.
Install
pip install envoi-geospatial
Requires Python 3.10 or newer.
Earth Engine setup
Datasets that come from Google Earth Engine (most of the built-in catalog — dem_copernicus_glo30, ndvi_landsat_annual, etc.) need a service account key from Google. If you only plan to use your own local rasters you can skip this section.
Step 1 — get the key file. In the Google Cloud Console, create a service account that has Earth Engine access and download its JSON key. You'll end up with a file like my-project-1234-abcdef.json. See the official guide for the full walkthrough.
Step 2 — put the file somewhere envoi can find it. Pick whichever of these is easiest:
-
In your project folder: save it as
credentials/ee_credentials.jsonnext to your script or notebook. This is the simplest option if you only use Earth Engine for one project. -
In your user folder: save it as
~/.config/envoi/ee_credentials.json(macOS/Linux) or%APPDATA%\envoi\ee_credentials.json(Windows). You may need to create the folder. Useful if you want the same key available to every project on your computer. -
At a custom path via environment variable: set
ENVOI_EE_CREDENTIALSto the file's path. envoi checks this before the two folders above, so it's the right choice when the key lives outside the defaults, when you swap between several credential files, or in CI / Docker. -
Anywhere else: pass the path explicitly in your code before calling
extract():from envoi import init_gee init_gee(credentials_path="/path/to/my-project-1234-abcdef.json")
Quick start
Pass any DataFrame with an identifier column and a latitude/longitude pair. By default envoi expects the GBIF / Darwin Core names gbifID, decimalLatitude, decimalLongitude and treats coordinates as WGS84 (EPSG:4326). If yours differ, override on the call with id_column=, latitude_column=, longitude_column=, and input_crs= (e.g. "EPSG:32634") — envoi reprojects to WGS84 internally. An optional eventDate column (or any column passed via date_column=) enables date-aware extraction.
import pandas as pd
from envoi import extract
sample_points = pd.DataFrame({
"gbifID": ["a", "b", "c"],
"decimalLatitude": [59.85, 59.86, 59.87],
"decimalLongitude": [17.63, 17.64, 17.65],
})
# Single output: mean and std of elevation in a 200 m window around each point.
outputs = extract(sample_points, {
"batch_id": "terrain",
"datasets": ["dem_copernicus_glo30"],
"settings": {
"output_type": "tabular",
"statistics": ["mean", "std"],
"window_size_m": 200,
},
})
# Files land in outputs/ by default:
# outputs/terrain.csv ← reducer columns
# outputs/terrain_qc.csv ← per-point coverage / nodata flags
# outputs/terrain_metadata.json ← per-run dataset metadata
Override the output location with extract(df, config, output_dir="my_dir").
The same config can also live in a YAML file — see examples/run.yml for a runnable template.
Walkthrough
For a guided end-to-end tutorial — tabular and raster extraction, local rasters, multi-dataset runs, date-aware extraction, and catalog discovery — see the walkthrough notebook.
Outputs
Tabular
output_type: "tabular" produces a table with one row per input point and one column per reducer × dataset × window. A separate QC file flags coverage and nodata.
Reducer columns look like: dem_copernicus_glo30_mean_200m, dem_copernicus_glo30_std_200m.
QC columns look like: dem_copernicus_glo30_in_extent_200m, dem_copernicus_glo30_n_pixels_200m, dem_copernicus_glo30_had_nodata_200m, dem_copernicus_glo30_coverage_pct_200m.
Available reducers:
- Core stats:
mean,median,min,max,sum,std,var,count,mode - Quantiles:
q05,q10,q25,q50,q75,q90,q95 - Categorical:
class_count,class_fraction(expanded per-class downstream) - Special:
point— samples the exact pixel at each coordinate (no window)
For the current authoritative list, run:
from envoi import list_reducers
list_reducers()
Output file format. Set output_file_format in the settings block:
| Value | Result |
|---|---|
"csv" |
outputs/<batch_id>.csv (default) |
"parquet" |
outputs/<batch_id>.parquet |
"dataframe" |
Returns the DataFrame in-memory, skips writing to disk. |
extract(sample_points, {
"batch_id": "terrain",
"datasets": ["dem_copernicus_glo30"],
"settings": {
"output_type": "tabular",
"statistics": ["mean"],
"window_size_m": 200,
"output_file_format": "csv",
},
})
Raster
output_type: "raster" exports a GeoTIFF tile per point, cropped to the requested window:
extract(sample_points, {
"batch_id": "terrain_tiles",
"datasets": ["dem_copernicus_glo30"],
"settings": {
"output_type": "raster",
"window_size_m": 200,
"resample_m": 10, # optional — resample all tiles to a common resolution
},
})
Tiles land at outputs/<batch_id>/<dataset>/<id>-<dataset>.tif.
Without resample_m, tiles are written in the source raster's native CRS at native resolution. The tile boundary snaps to the source pixel grid, so the actual extent is window_size_m rounded to whole pixels — any pixel touched by the requested window is included, and tile dimensions can vary slightly across points (especially for global datasets where pixel size depends on latitude).
With resample_m, every tile is reprojected to the point's UTM zone at exactly resample_m meters per pixel, on a grid snapped to that resolution. All tiles end up the same size (round(window_size_m / resample_m) pixels per side) and are spatially aligned across data sources — useful when feeding tiles to a CNN that expects a fixed input size or when comparing GEE and local rasters pixel-for-pixel.
Advanced usage
Multiple outputs in one call, date-aware extraction, mixing categorical and continuous datasets, per-call band selection, multiple window sizes, and custom dataset registration are covered in docs/advanced_usage.md. A starter custom catalog (local raster and Earth Engine entries) lives at examples/catalog.yml.
Reference
Built-in datasets
envoi ships with a curated set of Earth Engine datasets spanning terrain, climate, land cover, satellite imagery, vegetation indices, and human-impact themes. Inspect what's available — including any datasets you've registered with update_catalog() — using list_datasets():
from envoi import list_datasets
list_datasets() # just the names, one per line
list_datasets("info") # name + description, citation, source URLs
list_datasets("full") # the complete catalog entry for each dataset
list_datasets() both prints the listing and returns the same data as a list (of strings for the default call, of dicts for "info" / "full"), so you can keep using it programmatically.
A representative subset of the built-in catalog:
- Terrain —
dem_copernicus_glo30 - Climate —
climate_worldclim_v1_bioclim,climate_era5_monthly,climate_terraclimate_monthly - Land cover —
lulc_worldcover_2021,lulc_copernicus_lc100,lulc_naturallands_2020 - Satellite imagery —
sr_landsat_8day,sr_landsat_32day,sr_landsat_annual - Vegetation / productivity —
ndvi_landsat_annual,evi_landsat_annual,npp_modis_terra,agb_esa_cci - Human impact —
human_impact_indexplus eighthii_driver_*subcomponents - Embeddings —
aef_satellite_embeddings
The source, including descriptions, citations, and URLs for every entry, is src/envoi/configs/ee_catalog.yml.
Notes
- Input CRS. Coordinates in the input DataFrame are assumed to be in WGS84 (EPSG:4326). If yours are in a different CRS, pass
input_crs="EPSG:XXXX"toextract()and envoi reprojects them to WGS84 before extraction. - Window units.
window_size_mis in meters. Each window is projected into the point's local UTM zone so distances are correct globally. - Data source CRS and resolution. Both are detected automatically from each dataset — no manual configuration needed.
- QC, not failure. Low pixel coverage is flagged in QC columns rather than raising. Filter on
<dataset>_coverage_pct_<window>mto drop unreliable rows downstream.
How to cite
A paper describing envoi is currently in preparation. In the meantime, please cite the software directly:
Baggström, A., Nyström, J., & Andermann, T. (in prep.). envoi: automated environmental feature extraction for ecological analysis. Retrieved from https://github.com/BiodiversityDataLab/envoi
This entry will be updated with a DOI and full citation when the paper is published.
Contributors
Primary authors and maintainers — Adrian Baggström, Jakob Nyström.
Past contributors — Miguel Redondo at NBIS; Shaheryar, Thant Zin Bo, and Per Vincent Ankarbåge (Uppsala University Data Science MSc students).
Acknowledgements — Tobias Andermann (Conceptualization and PhD supervision for A.B. and J.N.). A.B., J.N., and T.A. received financial support from the SciLifeLab & Wallenberg Data Driven Life Science Program (grant: KAW 2020.0239) and from the Swedish Research Council (2023-05366). We are grateful to the maintainers of Google Earth Engine, rasterio, geopandas, and pyproj.
Project links
- License — MIT
- Contributing — CONTRIBUTING.md
- Issues / bug reports — github.com/BiodiversityDataLab/envoi/issues
- Repository — github.com/BiodiversityDataLab/envoi
Take these points: cross sky and stone;
return them clothed, no longer alone.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file envoi_geospatial-0.1.0.tar.gz.
File metadata
- Download URL: envoi_geospatial-0.1.0.tar.gz
- Upload date:
- Size: 132.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86d5c5469198ccfc281e456db8205e200cdecce8168c81c5cbcc05baf76694f1
|
|
| MD5 |
91a9857256bf7428dec3b4d957a317b3
|
|
| BLAKE2b-256 |
0c9c7738256a4aa1850e01700b8e8c30029fec3724f55cf929b6d90fec8d5bcc
|
Provenance
The following attestation bundles were made for envoi_geospatial-0.1.0.tar.gz:
Publisher:
release.yml on BiodiversityDataLab/envoi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
envoi_geospatial-0.1.0.tar.gz -
Subject digest:
86d5c5469198ccfc281e456db8205e200cdecce8168c81c5cbcc05baf76694f1 - Sigstore transparency entry: 1642554627
- Sigstore integration time:
-
Permalink:
BiodiversityDataLab/envoi@197dc7cbbcbfb2203d4dc3bf82877af33c51f734 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BiodiversityDataLab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@197dc7cbbcbfb2203d4dc3bf82877af33c51f734 -
Trigger Event:
push
-
Statement type:
File details
Details for the file envoi_geospatial-0.1.0-py3-none-any.whl.
File metadata
- Download URL: envoi_geospatial-0.1.0-py3-none-any.whl
- Upload date:
- Size: 101.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79d67d52a9e48671318dea0f0e753722fb8e52b5f4e6e11e358af46a87a0f9b0
|
|
| MD5 |
e06bc77bbfe5bfb02e5e145d06c715cf
|
|
| BLAKE2b-256 |
3283c1426c145b5f4aa1965fa2c5d0c14e0204c3f4c8a31e4e0e1739140bcaa6
|
Provenance
The following attestation bundles were made for envoi_geospatial-0.1.0-py3-none-any.whl:
Publisher:
release.yml on BiodiversityDataLab/envoi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
envoi_geospatial-0.1.0-py3-none-any.whl -
Subject digest:
79d67d52a9e48671318dea0f0e753722fb8e52b5f4e6e11e358af46a87a0f9b0 - Sigstore transparency entry: 1642554804
- Sigstore integration time:
-
Permalink:
BiodiversityDataLab/envoi@197dc7cbbcbfb2203d4dc3bf82877af33c51f734 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BiodiversityDataLab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@197dc7cbbcbfb2203d4dc3bf82877af33c51f734 -
Trigger Event:
push
-
Statement type: