Deep Learning for Earth Observation — automated training-dataset builder for EO segmentation tasks
Project description
dl4eo
dl4eo is a Python package for building multi-source Earth Observation training datasets and training segmentation models end-to-end. It automates the full pipeline from raw satellite data to model checkpoint:
- Sentinel-2 (L2A, cloud-filtered, spectral indices)
- Sentinel-1 RTC (VV + VH, batched by date)
- Copernicus DEM (elevation + slope, per-scene mosaic)
- Segmentation masks from any vector label file
- Train-ready PyTorch dataset with global normalization
- Model training with UNet, DeepLabV3+, SegFormer, ViT, and more
Installation
# Pipeline only (no PyTorch required)
pip install dl4eo
# Pipeline + training stack
pip install dl4eo[train]
Requires Python ≥ 3.8.
Quick Start
1 — Build a dataset
import dl4eo
dl4eo.generate_dataset(
base_dir="/data/glacial_lakes",
aoi_shapefile_dir="/data/aoi/", # folder with AOI.shp (study area polygon)
feature_shapefile="/data/lake_boundaries.shp", # label polygons
date_range="2021-06-01/2021-08-31",
cloud_cover=20,
patch_size=256, # pixels
overlap=0.0,
spectral_index="NDWI", # NDWI | NDSI | NDVI | NDRE | EVI | None
skip_sentinel1=False,
skip_dem=False,
normalize=False, # recommended: normalize at load time via PatchDataset
n_jobs=8,
)
2 — Quality control, splits, statistics
# Filter bad patches (nodata, no foreground, constant bands)
valid = dl4eo.qc.validate("/data/glacial_lakes", min_positive_fraction=0.001)
# Create train / val / test splits
splits = dl4eo.splits.make_splits(
"/data/glacial_lakes",
ratios=(0.7, 0.15, 0.15),
strategy="temporal", # "random" | "temporal" | "spatial"
valid_file="/data/glacial_lakes/valid_patches.txt",
)
# Global per-band statistics (training split only — no leakage)
stats = dl4eo.stats.compute("/data/glacial_lakes", split="train")
3 — PyTorch dataset
from dl4eo.io import PatchDataset
from torch.utils.data import DataLoader
ds = PatchDataset(
"/data/glacial_lakes",
split="train",
split_file="/data/glacial_lakes/splits.json",
stats_file="/data/glacial_lakes/stats.json",
norm="zscore", # "zscore" | "minmax" | "percentile" | None
bands=None, # None = all bands; or e.g. [0, 1, 2, 6, 7]
)
sample = ds[0]
# sample["image"] → FloatTensor [C, H, W]
# sample["mask"] → LongTensor [H, W]
loader = DataLoader(ds, batch_size=16, shuffle=True, num_workers=4)
PatchDataset inherits from torchgeo.datasets.NonGeoDataset when torchgeo is installed, and falls back to torch.utils.data.Dataset otherwise.
4 — Train a model (one-liner)
module = dl4eo.train(
data_dir="/data/glacial_lakes",
model="unet", # see SUPPORTED_MODELS below
backbone="resnet34",
num_classes=2,
split_strategy="temporal",
norm="zscore",
loss="dice_ce", # "dice_ce" | "dice" | "ce" | "focal"
batch_size=16,
max_epochs=50,
accelerator="gpu",
devices=1,
)
# → auto-generates splits.json + stats.json if missing
# → saves best checkpoint (monitored on val/iou)
# → returns loaded SegmentationModule
5 — Build a model manually
from dl4eo.train import build_model, SegmentationModule, SegDataModule, SUPPORTED_MODELS
import lightning as L
print(SUPPORTED_MODELS)
# ['unet', 'unet++', 'deeplabv3+', 'fpn', 'pspnet', 'linknet', 'pan', 'manet',
# 'segformer', 'vit-tiny', 'vit-small', 'vit-base']
net = build_model("segformer", in_channels=10, num_classes=2)
module = SegmentationModule(net, num_classes=2, lr=5e-4, loss="dice_ce")
dm = SegDataModule(
data_dir = "/data/glacial_lakes",
split_file = "/data/glacial_lakes/splits.json",
stats_file = "/data/glacial_lakes/stats.json",
batch_size = 8,
)
trainer = L.Trainer(max_epochs=100, accelerator="gpu", devices=1)
trainer.fit(module, dm)
Pipeline stages
| Stage | Description |
|---|---|
| 1 | Download Sentinel-2 L2A (STAC / Planetary Computer, cloud-filtered) |
| 2 | Preprocess S2: single-pass resample to 10 m + spectral index + stack |
| 3 | Generate patch AOIs: windowed reads, intersects user AOI polygon |
| 4 | Prepare DEM: one mosaic per scene, windowed reproject per patch |
| 5 | Prepare Sentinel-1 RTC: batched STAC search by date, VV+VH stack |
| 6 | Generate segmentation masks from label shapefile |
Normalization is intentionally excluded from the pipeline. Use dl4eo.stats.compute() on the training split and PatchDataset(norm="zscore") at load time — this avoids per-patch scale inconsistency and data leakage.
Supported models
All models are trained from scratch on arbitrary input channels (no dataset-specific pretrained weights).
| Family | Models | Default backbone |
|---|---|---|
| SMP | unet, unet++, deeplabv3+, fpn, pspnet, linknet, pan, manet |
resnet34 |
| SegFormer | segformer |
swin_tiny_patch4_window7_224 |
| ViT | vit-tiny, vit-small, vit-base |
timm ViT + patch-shuffle decoder |
SMP models also support ImageNet-pretrained encoders for 3-channel input: weights="imagenet".
Output structure
base_dir/
├── stack/ # Scene-level S2 stacks (bands + spectral index)
├── images/ # Clipped S2 patches
├── DEM/ # Per-scene DEM mosaics + per-patch stacks
├── GRD/ # Downloaded SAR granules (VV, VH)
├── Clipped_SAR/ # SAR reprojected to patch grid
├── stacked/ # S2 + DEM patches (10 bands)
├── stacked_with_sar/ # S2 + DEM + SAR patches (primary output)
├── mask/ # Binary (or multi-class) segmentation masks
├── AOI_boxes/ # Per-scene patch grid shapefiles
├── splits.json # Train / val / test split (after dl4eo.splits)
├── stats.json # Per-band statistics (after dl4eo.stats)
└── valid_patches.txt # QC-passing patch list (after dl4eo.qc)
Input requirements
| Parameter | Description |
|---|---|
aoi_shapefile_dir |
Folder containing one or more AOI .shp files (study area polygon) |
feature_shapefile |
Label vector file (e.g. lake outlines) — used for mask generation and patch filtering |
date_range |
"YYYY-MM-DD/YYYY-MM-DD" |
The AOI polygon controls which patches are generated. Only patches that intersect both the AOI and at least one label feature are kept.
Dependencies
Core (installed automatically):
numpy, rasterio, geopandas, shapely, fiona, matplotlib, joblib, pystac-client, planetary-computer, requests, scipy
Training (pip install dl4eo[train]):
torch>=2.0, lightning>=2.0, segmentation-models-pytorch>=0.3, timm>=0.9, torchmetrics>=1.0
Optional:
torchgeo>=0.5 — enables NonGeoDataset base class for PatchDataset
Example use cases
- Glacial lake mapping and segmentation
- Flood extent extraction
- Multimodal image fusion (S2 + S1 + DEM)
- Patch-based dataset generation for semantic segmentation
Author
Developed by Saurabh Kaushik Postdoctoral Researcher · University of Arizona Earth Observation · Deep Learning · Geo-Foundational Models · Cryosphere
License
MIT License
Citation
If you use dl4eo in your research, please cite:
@misc{kaushik2026dl4eo,
author = {Saurabh Kaushik},
title = {{dl4eo: A Python package for multi-source Earth Observation dataset building and segmentation model training}},
year = {2026},
howpublished = {\url{https://pypi.org/project/dl4eo/}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dl4eo-0.4.0.tar.gz.
File metadata
- Download URL: dl4eo-0.4.0.tar.gz
- Upload date:
- Size: 35.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f6efbef22d3a80345162d782bbfd9920c53b008d0285d9f6783cfc68b1912cd
|
|
| MD5 |
7804350f8d98bdd89b87ac92d12aa4d3
|
|
| BLAKE2b-256 |
db7bb445d5120248de34e5ad1ed71d439daadd7ca0c846efcf65740a4973a4cd
|
File details
Details for the file dl4eo-0.4.0-py3-none-any.whl.
File metadata
- Download URL: dl4eo-0.4.0-py3-none-any.whl
- Upload date:
- Size: 40.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65e836371dfac6ac0c4492d8caa8877f708cd2aabf9acbe9ad9147cc9004826b
|
|
| MD5 |
416d21a20384269eadbd6f72fdc03ebf
|
|
| BLAKE2b-256 |
52df1efe032a28429a8afebb85c7ea671cff8ad4fa6dafd7b22a8472395894e5
|