Skip to main content

World-class Geospatial AI Platform — pygeofetch + geoai + PyGeoVision

Project description

version

PyGeoVision

World-Class Geospatial AI Platform

The definitive Python framework for satellite data acquisition and geospatial AI —
unifying PyGeoFetch (22+ providers) and GeoAI (full AI stack) in one coherent API.


Python PyPI License Tests PyGeoFetch GeoAI


What is PyGeoVision?

PyGeoVision is a production-ready geospatial AI platform that bridges two world-class open-source packages:

Layer Package Responsibility
🛰️ Data PyGeoFetch Search & download satellite data from 22+ providers (Sentinel, Landsat, Planet, Maxar, USGS, Copernicus and more) with auth, caching, parallel downloads, post-processing, and YAML pipeline orchestration
🤖 AI GeoAI Full AI stack: segmentation, detection, classification, change detection, SAM, foundation models (Prithvi, DINOv3), embeddings, cloud masking, super-resolution, ONNX export
🔗 Bridge PyGeoVision Unified API, 10 end-to-end pipelines, CLI, experiment tracking, distributed training, automated labeling

Design principle: PyGeoVision never reimplements PyGeoFetch or GeoAI. All data operations delegate to PyGeoFetch. All AI operations delegate to GeoAI. PyGeoVision is the integration layer that makes them work seamlessly together.


Architecture Overview

graph TB
    subgraph PGV["🔗 PyGeoVision Platform"]
        direction TB
        
        subgraph CLI["⌨️  CLI  (pygeovision)"]
            C1[data auth/search/download]
            C2[ai segment/detect/train]
            C3[pipeline building_footprints ...]
            C4[models list/info]
        end

        subgraph API["🐍  Python API"]
            A1[PyGeoVision]
            A2[PyGeoVisionClient]
        end

        subgraph PF["🛰️  Data Layer — PyGeoFetch"]
            direction LR
            PF1[SatelliteFetcher\nCLI subprocess + pystac_client fallback]
            PF2[DataPipeline\nYAML pipeline builder]
            PF3[Providers Registry\n22 providers]
        end

        subgraph GA["🤖  AI Layer — GeoAI"]
            direction LR
            GA1[GeoAIEngine\n24 subsystem proxies]
            GA2[segment · detect · classify\nchange · train · infer · embed]
            GA3[sam · prithvi · cloud · sr\nonnx · canopy · dinov3 · tessera]
        end

        subgraph PIPE["⚙️  End-to-End Pipelines  (10)"]
            P1[building_footprints\nchange_detection\nland_cover]
            P2[water_bodies\nsolar_detection\ncrop_monitoring]
            P3[disaster_assessment\ndeforestation\nurban_growth\ncarbon_estimation]
        end

        subgraph OWN["🧩  PyGeoVision Own AI Stack"]
            direction LR
            O1[14 Model Architectures\nUNet · SegFormer · FCOS · ViT]
            O2[GeoTrainer\n6 losses · distributed]
            O3[TiledInference\nPostProcessor · Ensemble]
            O4[7 Labelers\nOSM · SAM · ESA · MS · Google]
            O5[ExperimentTracker\nDriftDetector]
        end
    end

    subgraph EXTERNAL["☁️  External Services"]
        E1["🛰️ Planetary Computer\nAWS Earth · Element 84\nCopernicus · USGS · NASA"]
        E2["🔐 Planet Labs\nMaxar · Airbus · Sentinel Hub\nASF · OpenTopography · GEE"]
        E3["🤗 HuggingFace Hub\nNASA Prithvi · DINOv3\ntimm · RF-DETR"]
    end

    C1 & C2 & C3 & C4 --> A1
    A1 --> PF1 & GA1 & PIPE
    A2 --> A1
    PF1 --> PF2 & PF3
    GA1 --> GA2 & GA3
    PIPE --> PF1
    PIPE --> GA1
    A1 --> OWN
    PF1 -->|"pygeofetch search run\npygeofetch download run\npygeofetch pipeline run"| E1 & E2
    GA1 -->|geoai API calls| E3

    style PGV fill:#0d1117,stroke:#2563eb,stroke-width:2px,color:#e2e8f0
    style PF fill:#1e293b,stroke:#f59e0b,stroke-width:1px,color:#e2e8f0
    style GA fill:#1e293b,stroke:#a855f7,stroke-width:1px,color:#e2e8f0
    style PIPE fill:#1e293b,stroke:#22c55e,stroke-width:1px,color:#e2e8f0
    style OWN fill:#1e293b,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style CLI fill:#0f172a,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style API fill:#0f172a,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    style EXTERNAL fill:#0f172a,stroke:#475569,stroke-width:1px,color:#e2e8f0

Data Flow: Search → Download → AI → Output

sequenceDiagram
    participant User
    participant PGV as PyGeoVision
    participant PGF as PyGeoFetch CLI
    participant STAC as STAC Provider<br/>(Planetary Computer etc.)
    participant GA as GeoAI
    participant HF as HuggingFace Hub

    User->>PGV: client.search(bbox, date_range, providers, cloud_cover_max)
    PGV->>PGF: pygeofetch search run --bbox ... --providers ...
    PGF->>STAC: STAC search API (22 providers)
    STAC-->>PGF: GeoJSON FeatureCollection (100 scenes)
    PGF-->>PGV: results.geojson
    PGV-->>User: List[SearchResult]

    User->>PGV: client.download(results[:5], post_process=["unzip","cog"])
    PGV->>PGF: pygeofetch download run --from-search ...
    PGF->>STAC: Parallel HTTP downloads (4 workers)
    STAC-->>PGF: .SAFE / .tif / .zip files
    PGF->>PGF: Post-process: unzip → reproject → compress → COG
    PGF-->>PGV: DownloadResult (path, bytes, duration)
    PGV-->>User: List[DownloadResult]

    User->>PGV: client.geoai.segment.buildings(path, output_vector="out.geojson")
    PGV->>GA: geoai.BuildingFootprintExtractor().predict(...)
    GA->>HF: Download pretrained model weights
    HF-->>GA: checkpoint.pth
    GA->>GA: Tiled inference (512×512, Gaussian blend)
    GA->>GA: Vectorize → smooth → regularize polygons
    GA-->>PGV: GeoJSON building footprints
    PGV-->>User: output_path, stats

Installation

# Core — PyGeoVision + PyGeoFetch integration
pip install pygeovision

# + Full GeoAI stack (PyTorch, transformers, SMP, leafmap, timm, torchgeo)
pip install "pygeovision[geoai]"

# + Raster/vector processing (rasterio, geopandas, rioxarray)
pip install "pygeovision[geo]"

pip install "pygeovision[extra]"

# + Everything
pip install "pygeovision[all]"

Requirements: Python 3.10+ · PyGeoFetch (pip install pygeofetch) · GeoAI (pip install geoai-py)


Quick Start

Five-Minute Walkthrough

import pygeovision as pgv

# ─── Initialise ─────────────────────────────────────────────────────────────
client = pgv.PyGeoVision()
print(client)  # PyGeoVision(v1.0.0, pygeofetch=✓, geoai=✓)

# ─── 1. Authenticate providers (stored securely in system keyring) ───────────
# Open access — no credentials needed:
#   planetary_computer, aws_earth, element84, noaa_big_data,
#   esa_scihub, jaxa_earth, isro_bhuvan, inpe_cbers, digitalglobe

# Credentialled providers:
client.add_credentials("usgs", username="user", password="pass")
client.add_credentials("planet", api_key="PL-xxxx")
client.add_credentials("copernicus", client_id="id", client_secret="secret")

# ─── 2. Search satellite data via PyGeoFetch ────────────────────────────────
results = client.search(
    bbox=(-0.15, 51.47, -0.10, 51.52),          # London, WGS84
    date_range=("2024-06-01", "2024-06-30"),
    providers=["planetary_computer", "copernicus", "usgs"],
    collections=["sentinel-2-l2a"],              # or use satellite="sentinel-2"
    cloud_cover_max=10,
    max_results=50,
    sort_by="cloud_cover",
)
print(f"Found {len(results)} scenes")
for r in results[:3]:
    print(r)
# [planetary_computer] Sentinel-2C | 2024-06-03 | cloud=0% score=0.99 | S2C_MSIL2A_...

# ─── 3. Download with post-processing via PyGeoFetch ────────────────────────
downloads = client.download(
    results[:3],
    output_dir="./sentinel2/",
    parallel=4,
    verify_checksum=True,
    resume=True,
    post_process=["unzip", "reproject:EPSG:4326", "compress:lzw", "cog"],
)
for d in downloads:
    print(d)  # ✓ scene-id (245.3 MB, 12.1s) → ./sentinel2/scene.tif

# ─── 4. AI: Segment buildings with GeoAI ────────────────────────────────────
client.geoai.segment.buildings(
    downloads[0].path,
    output_path="buildings.tif",
    output_vector="buildings.geojson",
    confidence_threshold=0.5,
)

# ─── 5. AI: Detect changes between two dates ────────────────────────────────
client.geoai.change.detect(
    "scene_2020.tif",
    "scene_2024.tif",
    output_path="changes.tif",
)

# ─── 6. AI: Train a custom segmentation model ────────────────────────────────
client.geoai.train.segmentation(
    "./building_chips/",
    "building_model.pth",
    num_classes=2,
    epochs=100,
    backbone="efficientnet-b4",
    batch_size=16,
)

# ─── 7. End-to-end pipeline (PyGeoFetch data + GeoAI inference) ─────────────
result = client.pipeline(
    "building_footprints",
    bbox=(-0.15, 51.47, -0.10, 51.52),
    date="2024-06",
    output_dir="./results/",
)
print(result.output_path)   # ./results/building_footprints/prediction.tif
print(result.stats)         # {"buildings_detected": 1847, "coverage_pct": 0.312}

PyGeoFetch Integration — Satellite Data (22 Providers)

PyGeoVision uses PyGeoFetch as its exclusive data backend. The pygeofetch CLI is called via subprocess for the full 22-provider experience, with pystac_client + planetary_computer as a Python fallback for STAC providers.

Provider Registry

Provider ID Name Auth Key Satellites SAR Sub-m STAC
planetary_computer Microsoft Planetary Computer 🌐 Open Sentinel-1/2, Landsat, MODIS, NAIP
aws_earth AWS Earth Open Data 🌐 Open Sentinel-2 COGs, Landsat, NAIP
element84 Element 84 Earth Search 🌐 Open Sentinel-2 COGs, Landsat Col 2
noaa_big_data NOAA Big Data 🌐 Open GOES-16/17/18, NEXRAD
esa_scihub ESA SciHub Mirror 🌐 Open Copernicus public mirrors
jaxa_earth JAXA ALOS World 🌐 Open ALOS 30m DSM, PALSAR
isro_bhuvan ISRO Bhuvan 🌐 Open ResourceSat, Cartosat, Oceansat
inpe_cbers INPE CBERS 🌐 Open CBERS-4/4A
digitalglobe DigitalGlobe Open Data 🌐 Open Disaster response VHR
geoserver_generic GeoServer Generic OGC 🌐 Open Any WMS/WCS/WFS service
usgs USGS Earth Explorer 🔐 User/Pass Landsat 1-9, ASTER, MODIS
copernicus Copernicus CDSE 🔐 OAuth2 Sentinel-1/2/3/5P
nasa_earthdata NASA Earthdata CMR 🔐 OAuth2 MODIS, VIIRS, ICESat-2, GEDI
nasa_earthdata_cloud NASA Earthdata Cloud 🔐 OAuth2+S3 Cloud-hosted NASA data
opentopography OpenTopography 🔐 API Key SRTM, Copernicus DEM 30/90m, LiDAR
planet Planet Labs 🔐 API Key PlanetScope 3–5m, SkySat 0.5m
sentinel_hub Sentinel Hub 🔐 OAuth2 All Sentinels, Landsat, MODIS
maxar_gbdx Maxar GBDX 🔐 Token WorldView 1–4, GeoEye-1 (30cm)
airbus_oneatlas Airbus OneAtlas 🔐 API Key Pléiades 50cm, SPOT 6/7 1.5m
alaska_satellite_facility Alaska Satellite Facility 🔐 Earthdata Sentinel-1, ALOS PALSAR
google_earth_engine Google Earth Engine 🔐 Service Acct Multi-petabyte global catalog
terrabotics TerraBotics 🔐 API Key Archive + tasking

Authentication

# User/password (USGS, NASA Earthdata)
client.add_credentials("usgs", username="user", password="pass")
client.add_credentials("nasa_earthdata", username="user", password="pass")

# API key (Planet, OpenTopography, Airbus)
client.add_credentials("planet", api_key="PL-xxxx")
client.add_credentials("opentopography", api_key="OT-xxxx")
client.add_credentials("airbus_oneatlas", api_key="AB-xxxx")

# OAuth2 (Copernicus, Sentinel Hub, Maxar)
client.add_credentials("copernicus", client_id="id", client_secret="secret")
client.add_credentials("sentinel_hub", client_id="id", client_secret="secret")

# Chaining
client \
    .add_credentials("usgs", username="u", password="p") \
    .add_credentials("planet", api_key="PL-xxxx") \
    .add_credentials("copernicus", client_id="id", client_secret="secret")

# Test connectivity
client.test_provider("planetary_computer")  # True

# List stored credentials
client.data.list_credentials()  # ['usgs', 'planet', 'copernicus']

Search API

# By provider
results = client.search(
    bbox=(-74.1, 40.6, -73.7, 40.9),
    date_range=("2024-01-01", "2024-06-01"),
    providers=["planetary_computer", "copernicus", "usgs"],
    cloud_cover_max=15,
    max_results=100,
    sort_by="cloud_cover",          # datetime | cloud_cover | score | satellite
    sort_order="asc",
)

# By satellite shortcut (auto-selects providers)
results = client.search(
    bbox=..., date_range=..., satellite="sentinel-2",    # or "landsat", "planet", "dem"
)

# By STAC collection
results = client.search(
    bbox=..., date_range=...,
    collections=["sentinel-2-l2a", "landsat-c2-l2"],
)

# Advanced: CQL2 filter expression
results = client.search(
    bbox=..., date_range=...,
    cql2_filter="eo:cloud_cover < 5 AND platform = 'sentinel-2a'",
)

# SearchResult properties
r = results[0]
r.id              # 'S2C_MSIL2A_20240603T153811_R001'
r.provider        # 'planetary_computer'
r.satellite       # 'Sentinel-2C'
r.date            # '2024-06-03'
r.cloud_cover     # 0.0
r.bbox            # (-0.15, 51.47, -0.10, 51.52)
r.score           # 0.99
r.collection      # 'sentinel-2-l2a'
r.resolution_m    # 10.0
r.is_sar          # False
r.to_dict()       # JSON-serializable dict
r.to_stac_item()  # pystac Item object

Download API

downloads = client.download(
    results[:5],
    output_dir="./data/",
    parallel=4,                # concurrent downloads
    verify_checksum=True,      # SHA256 verification
    resume=True,               # resume interrupted downloads
    retry_attempts=5,          # exponential backoff
    bandwidth_limit_mb=20.0,   # throttle in MB/s
    on_failure="skip",         # skip | abort | retry
    post_process=[
        "unzip",                       # extract ZIP/TAR archives
        "reproject:EPSG:4326",         # reproject to target CRS
        "compress:lzw",                # apply compression (lzw | deflate | zstd)
        "ndvi",                        # compute NDVI band
        "ndwi",                        # compute NDWI band
        "resample:10",                 # resample to N metres
        "cog",                         # Cloud Optimized GeoTIFF
    ],
)

# DownloadResult properties
d = downloads[0]
d.success           # True
d.path              # Path('./data/S2C_MSIL2A_20240603_visual.tif')
d.size_mb           # 245.3
d.duration_seconds  # 12.1
d.checksum_verified # True

Pipeline Builder (YAML Orchestration)

# Build a recurring pipeline programmatically
pipeline = (
    client.create_pipeline("weekly-sentinel2-london", description="Weekly S2 NDVI")
    .search(
        providers=["planetary_computer", "copernicus"],
        bbox=(-0.15, 51.47, -0.10, 51.52),
        date_range="last_7_days",        # last_7_days | last_30_days | this_month
        cloud_cover="0-10",
        max_results=20,
    )
    .filter("data.cloud_cover < 5")
    .download(
        parallel=4,
        output="./raw/",
        verify_checksum=True,
        post_process=["unzip", "reproject:EPSG:4326", "cog"],
    )
    .export(
        format="cloud_optimized_geotiff",
        destination="s3://my-bucket/london/",
    )
    .set_schedule("0 6 * * 1")           # Every Monday 06:00 UTC
)

# Save and run
pipeline.save("weekly-sentinel2.yaml")
pipeline.run()                           # delegates to: pygeofetch pipeline run

# Or run a YAML file directly
client.run_pipeline_yaml("weekly-sentinel2.yaml")
client.run_pipeline_yaml("weekly-sentinel2.yaml", step="download")

# Schedule, list, inspect
client.data.schedule_pipeline("weekly-sentinel2.yaml", cron="0 6 * * 1")
client.data.list_scheduled_pipelines()
client.data.pipeline_history(limit=20)

Generated YAML:

name: weekly-sentinel2-london
description: Weekly S2 NDVI
schedule: 0 6 * * 1
steps:
  - search:
      providers: [planetary_computer, copernicus]
      bbox: "-0.15,51.47,-0.10,51.52"
      date_range: last_7_days
      cloud_cover: 0-10
      max_results: 20
  - filter:
      expression: data.cloud_cover < 5
  - download:
      parallel: 4
      output: ./raw/
      verify_checksum: true
      post_process: unzip,reproject:EPSG:4326,cog
  - export:
      format: cloud_optimized_geotiff
      destination: s3://my-bucket/london/

GeoAI Integration — 24 AI Subsystems

PyGeoVision exposes GeoAI's complete API through client.geoai.*. All imports are lazy — GeoAI is only loaded when first accessed.

ga = client.geoai         # GeoAIEngine proxy
ga.version                # '0.39.2'
ga.is_available           # True
ga.raw()                  # raw geoai module for direct access

Segmentation (client.geoai.segment)

# Building footprints — geoai.BuildingFootprintExtractor
client.geoai.segment.buildings(
    "sentinel2.tif", output_path="buildings.tif", output_vector="buildings.geojson",
    confidence_threshold=0.5, chip_size=512, overlap=64,
)

# Solar panel detection — geoai.SolarPanelDetector
client.geoai.segment.solar_panels("aerial.tif", output_vector="solar.geojson")

# Agricultural field delineation — geoai.AgricultureFieldDelineator
client.geoai.segment.agriculture_fields("sentinel2.tif", output_vector="fields.geojson")

# Water body segmentation — geoai.segment_water
client.geoai.segment.water("sentinel2.tif", output_path="water.tif", band_order="sentinel2")

# Custom model segmentation — geoai.semantic_segmentation
client.geoai.segment.custom("scene.tif", "model.pth", "pred.tif", num_classes=5)

# HuggingFace Hub model — geoai.image_segmentation
client.geoai.segment.with_hf_model("scene.tif", "openmmlab/upernet-swin-base")

# SAM auto-segmentation — geoai.mask_generation
client.geoai.segment.with_sam("aerial.tif", output_path="masks.tif")

# timm-backbone model — geoai.timm_semantic_segmentation
client.geoai.segment.timm_model("scene.tif", "timm_model.pth")

# HuggingFace Hub timm model — geoai.timm_segmentation_from_hub
client.geoai.segment.from_hub("scene.tif", "giswqs/building-footprint-usa")

Detection (client.geoai.detect)

# Vehicle detection — geoai.CarDetector
client.geoai.detect.cars("aerial.tif", output_path="cars.geojson")

# Ship detection — geoai.ShipDetector
client.geoai.detect.ships("port_scene.tif", output_path="ships.geojson")

# Parking spot detection — geoai.ParkingSplotDetector
client.geoai.detect.parking("car_park.tif", output_path="spots.geojson")

# Natural language grounded detection — geoai.GroundedSAM
client.geoai.detect.grounded("aerial.tif", "swimming pools", output_path="pools.geojson")
client.geoai.detect.grounded("aerial.tif", "solar panels on rooftops")

# RF-DETR real-time detection — geoai.rfdetr_detect
client.geoai.detect.rfdetr("scene.tif", output_path="detections.geojson")

# Multi-class object detection — geoai.multiclass_detection
client.geoai.detect.multiclass("scene.tif", "nwpu_model.pth", output_path="det.geojson")

# Instance segmentation — geoai.instance_segmentation
client.geoai.detect.instance_segmentation("scene.tif", "maskrcnn.pth")

Classification (client.geoai.classify)

# Scene classification — geoai.classify_image
result = client.geoai.classify.classify("tile.tif", "classifier.pth")

# CLIP zero-shot land cover — geoai.CLIPVectorClassifier
client.geoai.classify.land_cover(
    "sentinel2.tif",
    classes=["forest", "water", "urban", "agriculture", "bare soil"],
)

# Batch classification — geoai.classify_images
client.geoai.classify.batch("./image_chips/", "classifier.pth")

# Train a classifier — geoai.train_classifier
client.geoai.classify.train("./dataset/", "classifier.pth", num_classes=8)

Change Detection (client.geoai.change)

# ChangeSTAR bi-temporal change detection — geoai.changestar_detect
client.geoai.change.detect(
    "scene_2020.tif",
    "scene_2024.tif",
    output_path="changes.tif",
)

# List available ChangeSTAR model variants
client.geoai.change.list_models()   # ['changestar-v1', 'changestar-v2', ...]

Training (client.geoai.train)

# Semantic segmentation — geoai.train_segmentation_model
client.geoai.train.segmentation(
    "./building_chips/", "building_model.pth",
    val_data="./val_chips/", num_classes=2,
    epochs=100, batch_size=16, backbone="efficientnet-b4",
)

# Land cover with specialist losses — geoai.train_segmentation_landcover
client.geoai.train.segmentation_landcover(
    "./lc_chips/", "landcover.pth",
    num_classes=11, loss_fn="unified_focal",    # dice | focal | tversky | unified_focal
)

# Multi-class object detection — geoai.train_multiclass_detector
client.geoai.train.detection("./nwpu_chips/", "detector.pth", num_classes=10)

# Instance segmentation — geoai.train_instance_segmentation_model
client.geoai.train.instance_segmentation("./coco_chips/", "maskrcnn.pth")

# timm-backbone (1000+ backbones) — geoai.train_timm_segmentation_model
client.geoai.train.timm_segmentation(
    "./chips/", "timm_seg.pth", backbone="convnext_base",
)

# Pixel regression (NDVI, height, biomass) — geoai.train_pixel_regressor
client.geoai.train.pixel_regressor("./regression_chips/", "regressor.pth")

# RF-DETR training — geoai.rfdetr_train
client.geoai.train.rfdetr("./coco_data/", "rfdetr.pth")

# Export training chips — geoai.export_training_data
client.geoai.train.generate_chips(
    "sentinel2.tif", "labels.tif", "./chips/", chip_size=256, overlap=32,
)

Foundation Models

# NASA Prithvi (HLS multispectral) — geoai.load_prithvi_model
client.geoai.prithvi.list_models()    # ['prithvi-eo-1.0-100M', 'prithvi-eo-2.0-300M']
model = client.geoai.prithvi.load("prithvi-eo-1.0-100M")
client.geoai.prithvi.infer("hls_tile.tif", model, output_path="pred.tif")

# Segment Anything Model (SAM) — geoai.mask_generation / GroundedSAM
client.geoai.sam.generate_masks("aerial.tif", output_path="masks.tif")
client.geoai.sam.grounded("aerial.tif", "solar panels on rooftops")

# DINOv3 — geoai.analyze_image_patches / train_dinov3_segmentation
client.geoai.dinov3.analyze("scene.tif")
client.geoai.dinov3.similarity_map("scene.tif", query_point=(256, 256))
client.geoai.dinov3.finetune("./chips/", "dino_seg.pth", num_classes=5)
client.geoai.dinov3.segment("scene.tif", "dino_seg.pth", output_path="pred.tif")

Embeddings (client.geoai.embed)

# Patch embeddings — geoai.extract_patch_embeddings
embeddings = client.geoai.embed.patch("sentinel2.tif", chip_size=64)   # (N, 512) array

# Pixel embeddings — geoai.extract_pixel_embeddings
pix_emb = client.geoai.embed.pixel("sentinel2.tif")

# Cluster embeddings — geoai.cluster_embeddings
client.geoai.embed.cluster(embeddings, n_clusters=10)

# Cosine similarity — geoai.embedding_similarity
score = client.geoai.embed.similarity(emb_a, emb_b)    # 0.85

# Visualize (UMAP / t-SNE) — geoai.visualize_embeddings
client.geoai.embed.visualize(embeddings)

# Tessera satellite embedding datasets — geoai.tessera_*
client.geoai.tessera.available_years(bbox=(-74.1, 40.6, -73.7, 40.9))
client.geoai.tessera.coverage(bbox=...)
client.geoai.tessera.download(bbox=..., output_dir="./tessera/")

Cloud, SR, ONNX, Canopy, Captions

# Cloud masking — geoai.predict_cloud_mask_from_raster
client.geoai.cloud.predict("sentinel2.tif", output_path="cloud_mask.tif")
client.geoai.cloud.batch("./scenes/", "./cloud_masks/")
stats = client.geoai.cloud.statistics("cloud_mask.tif")  # {"cloud_cover": 0.15}

# Super-resolution (ESRGAN) — geoai.super_resolution
client.geoai.sr.enhance("landsat.tif", output_path="enhanced.tif", scale_factor=4)

# ONNX export and inference — geoai.export_to_onnx / onnx_semantic_segmentation
client.geoai.onnx.export(model, "model.onnx", input_shape=(1, 4, 512, 512))
client.geoai.onnx.segmentation("scene.tif", "model.onnx", output_path="pred.tif")

# Canopy height estimation — geoai.canopy_height_estimation
client.geoai.canopy.estimate("sentinel2.tif", output_path="canopy_height.tif")

# Moondream VLM captioning — geoai.moondream_*
caption = client.geoai.caption.caption("tile.tif")
answer = client.geoai.caption.query("tile.tif", "Are there buildings?")
client.geoai.caption.detect("tile.tif", "cars")

# Water body segmentation with sensor presets — geoai.segment_water
client.geoai.water.segment("s2.tif", band_order="sentinel2")  # or "naip", "landsat"

# RF-DETR detection — geoai.rfdetr_detect
client.geoai.rfdetr.detect("scene.tif", output_path="det.geojson")
client.geoai.rfdetr.list_models()
client.geoai.rfdetr.from_hub("scene.tif", "rfdetr-base")

# Interactive visualization — geoai.Map / view_raster / view_vector
m = client.geoai.map.leafmap()
client.geoai.map.view_raster("scene.tif")
client.geoai.map.view_vector("buildings.geojson")

Utilities (client.geoai.utils)

client.geoai.utils.raster_info("scene.tif")          # width, height, bands, CRS
client.geoai.utils.raster_to_vector("pred.tif", "polygons.geojson")
client.geoai.utils.vector_to_raster("polys.geojson", "ref.tif", "raster.tif")
client.geoai.utils.clip_by_bbox("scene.tif", bbox, "clipped.tif")
client.geoai.utils.mosaic(["tile1.tif", "tile2.tif"], "mosaic.tif")
client.geoai.utils.stack_bands(["B04.tif", "B08.tif", "B11.tif"], "stack.tif")
client.geoai.utils.smooth_vector("buildings.geojson", "smooth.geojson")
client.geoai.utils.regularize("buildings.geojson", "regular.geojson")
iou = client.geoai.utils.iou(pred_mask, gt_mask)
metrics = client.geoai.utils.segmentation_metrics(pred, target)  # miou, f1, acc
device = client.geoai.utils.get_device()              # 'cuda' | 'mps' | 'cpu'
client.geoai.utils.empty_cache()

End-to-End Pipelines (10)

Each pipeline orchestrates the full workflow: PyGeoFetch search → download → GeoAI inference → vector output.

graph LR
    subgraph "End-to-End Pipeline Flow"
        S[🛰️ PyGeoFetch\nSearch] --> D[📥 PyGeoFetch\nDownload]
        D --> PP[⚙️ Post-Process\nunzip · reproject · cog]
        PP --> AI[🤖 GeoAI\nInference]
        AI --> V[📊 Vector Output\nGeoJSON · GeoParquet]
        V --> STATS[📈 Statistics\n& Metadata]
    end
Pipeline Data Source AI Model Output
building_footprints Sentinel-2 / NAIP (PC) GeoAI BuildingFootprintExtractor GeoJSON polygons
change_detection Bi-temporal Sentinel-2 GeoAI ChangeSTAR Change mask GeoTIFF
land_cover Sentinel-2 (PC / Copernicus) ESA WorldCover / GeoAI SegFormer Classification GeoTIFF
water_bodies Sentinel-2 GeoAI segment_water (NDWI) Water polygon GeoJSON
solar_detection NAIP / Sentinel-2 GeoAI SolarPanelDetector GeoJSON polygons
crop_monitoring Sentinel-2 seasonal stack GeoAI SegFormer-B2 Crop type map
disaster_assessment Post-event imagery GeoAI Siamese-UNet Damage assessment
deforestation Bi-temporal Landsat/S2 GeoAI ChangeFormer Forest loss mask
urban_growth Bi-temporal Landsat GeoAI Siamese-UNet Urban expansion map
carbon_estimation Sentinel-2 NDVI NDVI → AGB formula Carbon stock estimate
# Building footprints
result = client.pipeline("building_footprints",
    bbox=(-0.15, 51.47, -0.10, 51.52), date="2024-06")

# Bi-temporal change detection
result = client.pipeline("change_detection",
    bbox=(-74.1, 40.6, -73.7, 40.9),
    date_before="2020-01", date_after="2024-01")

# Land cover with ESA WorldCover source
result = client.pipeline("land_cover",
    bbox=..., date="2024-06", source="worldcover")

# Solar panel mapping
result = client.pipeline("solar_detection",
    bbox=..., date="2024-06")

# All pipelines return PipelineResult
result.success        # True
result.output_path    # Path('./results/building_footprints/prediction.tif')
result.stats          # {"buildings_detected": 1847, "coverage_pct": 0.312}
result.metadata       # {"provider": "planetary_computer", "scene_id": "..."}

PyGeoVision's Own AI Stack

In addition to the GeoAI integration, PyGeoVision ships its own production AI stack for training custom models on geospatial data.

Model Registry (14 Architectures)

Model Task Architecture Pretrained
unet_resnet50 Segmentation U-Net + ResNet-50
unet_efficientnet_b4 Segmentation U-Net + EfficientNet-B4
segformer_b2 Segmentation SegFormer-B2
segformer_b5 Segmentation SegFormer-B5
deeplabv3plus_resnet101 Segmentation DeepLabV3+
fcos_resnet50 Detection FCOS (anchor-free)
retinanet_resnet50 Detection RetinaNet
resnet50_cls Classification ResNet-50
efficientnet_b3_cls Classification EfficientNet-B3
vit_b16_cls Classification ViT-B/16
siamese_unet Change Detection Siamese U-Net
changeformer Change Detection ChangeFormer
esrgan_geo Super Resolution ESRGAN-Geo
srcnn Super Resolution SRCNN
# Load from registry
from pygeovision.ai.models import ModelHub
hub = ModelHub()
model = hub.load("segformer_b2", num_classes=5, pretrained=True)

# List models
from pygeovision.ai.models.registry import registry
models = registry.list_models(task="segmentation", pretrained_only=True)

GeoTrainer

from pygeovision.ai.training.trainer import GeoTrainer

trainer = GeoTrainer(
    model=model,
    train_dataset=train_ds,
    val_dataset=val_ds,
    num_classes=5,
    max_epochs=100,
    learning_rate=1e-4,
    batch_size=16,
    loss_fn="dice",             # dice | focal | tversky | unified_focal | weighted_ce
    optimizer="adamw",
    scheduler="cosine",
    mixed_precision=True,
    gradient_accumulation_steps=4,
    callbacks=["early_stopping", "model_checkpoint", "rich_progress"],
    export_onnx=True,           # auto-export on training complete
)
result = trainer.fit()

Supported losses: DiceLoss, FocalLoss, DiceFocalLoss, TverskyLoss, WeightedCrossEntropyLoss, ChangeDetectionLoss

Supported metrics: SegmentationMetrics (mIoU, F1, accuracy, precision, recall), ConfusionMatrix, BinaryMetrics

Automated Labeling (7 Labelers)

# OpenStreetMap polygons
client.ai.label(tiles, labeler="osm", feature_type="building")

# Microsoft Building Footprints (global coverage)
client.ai.label(tiles, labeler="microsoft_buildings")

# Google Open Buildings
client.ai.label(tiles, labeler="google_buildings")

# ESA WorldCover (land cover)
client.ai.label(tiles, labeler="esa_worldcover")

# Google Dynamic World (near-real-time)
client.ai.label(tiles, labeler="dynamic_world")

# SAM auto-labeling
client.ai.label(tiles, labeler="sam")

# Foundation model labeling
client.ai.label(tiles, labeler="foundation", model_id="giswqs/building-footprint")

TiledInference

from pygeovision.ai.inference.tiled_inference import TiledInference

engine = TiledInference(
    model=model,
    tile_size=512,
    overlap=64,
    blend_mode="gaussian",      # gaussian | average
    batch_size=4,
    device="cuda",
)
prediction = engine.run("large_scene.tif", "prediction.tif", num_classes=5)

Command-Line Interface

# ─── System ──────────────────────────────────────────────────────────────────
pygeovision status                          # Full status: PyGeoFetch + GeoAI + torch
pygeovision status --json                   # Machine-readable JSON
pygeovision doctor                          # Diagnose installation and connectivity

# ─── Authentication (via PyGeoFetch keyring) ─────────────────────────────────
pygeovision data auth add usgs --username USER --password PASS
pygeovision data auth add planet --api-key PL-xxxx
pygeovision data auth add copernicus --client-id ID --client-secret SECRET
pygeovision data auth list
pygeovision data auth test planetary_computer
pygeovision data auth remove usgs

# ─── Providers ───────────────────────────────────────────────────────────────
pygeovision data providers                  # List all 22 providers
pygeovision data providers --open-only      # Open-access only
pygeovision data providers --sar            # SAR-capable only
pygeovision data providers --sub-meter      # Sub-metre resolution only

# ─── Search ──────────────────────────────────────────────────────────────────
pygeovision data search \
    --bbox "-74.1,40.6,-73.7,40.9" \
    --date 2024-06 \
    --providers planetary_computer \
    --cloud-max 10 \
    --output results.geojson


# Direct pygeofetch — works
pygeofetch search run --bbox "-74.1,40.6,-73.7,40.9" --start-date 2024-01-01 --cloud-cover 0-15 --providers planetary_computer --output results.geojson

# Then in your pygeovision code, read the results file
pygeovision data search --bbox -74.1 40.6 -73.7 40.9 --start-date 2024-06-01 --end-date 2024-06-30 --providers planetary_computer --cloud-max 30 --output result_new.geojson > out.txt 2>&1

pygeofetch search run --bbox "-74.1,40.6,-73.7,40.9" --start-date 2024-01-01 --cloud-cover 0-15 --providers planetary_computer --output result_new.geojson

pygeovision data search --bbox ... --satellite sentinel-2 --format table
pygeovision data search --bbox ... --collections sentinel-2-l2a,landsat-c2-l2

# ─── Download ────────────────────────────────────────────────────────────────
pygeovision data download --from-search result_new.geojson --output ./data/ --parallel 4 --verify-checksum --post-process unzip,reproject:EPSG:4326,compress:lzw,cog




# def _run_cli(self, args, timeout=300):
#     cmd = self._build_cmd(args)
#     env = os.environ.copy()
#     env['PYTHONIOENCODING'] = 'utf-8'
#     env['PYTHONUTF8'] = '1'
    
#     return subprocess.run(
#         cmd,
#         capture_output=True,
#         text=True,
#         encoding='utf-8',
#         env=env,
#         timeout=timeout,
#         creationflags=subprocess.CREATE_NO_WINDOW if platform.system() == "Windows" else 0,
#     )



# ─── Pipeline ────────────────────────────────────────────────────────────────
pygeovision data pipeline run weekly-sentinel2.yaml
pygeovision data pipeline validate weekly-sentinel2.yaml
pygeovision data pipeline schedule weekly-sentinel2.yaml --cron "0 6 * * 1"
pygeovision data pipeline list
pygeovision data pipeline history

# ─── Cache ───────────────────────────────────────────────────────────────────
pygeovision data cache stats
pygeovision data cache clear --older-than 7d
pygeovision data cache clear --provider planetary_computer

# ─── AI: Segmentation ────────────────────────────────────────────────────────
pygeovision ai segment buildings \
    --input sentinel2.tif \
    --output buildings.tif \
    --vector buildings.geojson \
    --confidence 0.5

pygeovision ai segment solar --input aerial.tif --output solar.tif
pygeovision ai segment water --input s2.tif --output water.tif
pygeovision ai segment agriculture --input s2.tif --output fields.geojson
pygeovision ai segment custom --input scene.tif --output pred.tif --model model.pth

# ─── AI: Detection ───────────────────────────────────────────────────────────
pygeovision ai detect cars --input aerial.tif --output cars.geojson
pygeovision ai detect ships --input port.tif --output ships.geojson
pygeovision ai detect grounded --input aerial.tif --prompt "swimming pools" --output pools.geojson
pygeovision ai detect rfdetr --input scene.tif --output det.geojson

# ─── AI: Classification ──────────────────────────────────────────────────────
pygeovision ai classify scene --input tile.tif --model classifier.pth
pygeovision ai classify land-cover --input s2.tif --classes "forest,water,urban,agriculture"

# ─── AI: Training ────────────────────────────────────────────────────────────
pygeovision ai train segmentation \
    --data ./building_chips/ \
    --output building_model.pth \
    --num-classes 2 \
    --epochs 100 \
    --backbone efficientnet-b4

pygeovision ai train land-cover \
    --data ./lc_chips/ \
    --output lc_model.pth \
    --num-classes 11 \
    --loss-fn unified_focal

pygeovision ai train detection \
    --data ./nwpu_chips/ \
    --output detector.pth \
    --num-classes 10

# ─── AI: Inference & Utilities ───────────────────────────────────────────────
pygeovision ai infer \
    --input large_scene.tif \
    --model model.pth \
    --output prediction.tif \
    --num-classes 5 \
    --tile-size 512 \
    --overlap 64

pygeovision ai chips \
    --image sentinel2.tif \
    --label labels.tif \
    --output ./chips/ \
    --chip-size 256

pygeovision ai cloud-mask --input sentinel2.tif --output cloud.tif

# ─── End-to-End Pipelines ────────────────────────────────────────────────────
pygeovision pipeline building_footprints \
    --bbox -0.15 51.47 -0.10 51.52 \
    --date 2024-06 \
    --output ./results/

pygeovision pipeline change_detection \
    --bbox -74.1 40.6 -73.7 40.9 \
    --date-before 2020-01 \
    --date-after 2024-01 \
    --output ./changes/

pygeovision pipeline list                   # Show all available pipelines

# ─── Models ──────────────────────────────────────────────────────────────────
pygeovision models list
pygeovision models list --task segmentation --pretrained-only
pygeovision models info unet_resnet50
pygeovision models cache
pygeovision models cache --clear

Package Structure

pygeovision/                          5,845 lines across core files
│
├── __init__.py                 709   PyGeoVision — main client class
├── _version.py                       Version
├── api/__init__.py                   PyGeoVisionClient — web/notebook API
│
├── core/
│   ├── config.py                     PyGeoVisionConfig (Pydantic, YAML, env vars)
│   ├── engine.py                     Core engine wrapping SatelliteFetcher
│   └── exceptions.py                 Full exception hierarchy
│
├── data/                      2,275  PyGeoFetch integration layer
│   ├── fetch.py              1,636   SatelliteFetcher (CLI subprocess + pystac_client)
│   ├── providers.py            280   All 22 providers, STAC endpoints, shortcuts
│   └── pipeline.py             359   DataPipeline YAML builder
│
├── ai/                               AI layer
│   ├── engine.py                     AIEngine — lazy hub for own AI stack
│   ├── geoai/__init__.py      1,278  GeoAIEngine — 24 GeoAI subsystem proxies
│   │
│   ├── models/
│   │   ├── registry.py               14 registered model architectures
│   │   ├── hub.py                    ModelHub — download & cache weights
│   │   └── architectures/            UNet, SegFormer, FCOS, ViT, ...
│   │
│   ├── training/
│   │   ├── trainer.py                GeoTrainer — full training loop
│   │   ├── losses.py                 6 specialist loss functions
│   │   ├── metrics.py                IoU, F1, accuracy, confusion matrix
│   │   ├── callbacks.py              EarlyStopping, ModelCheckpoint, Progress
│   │   ├── distributed.py            Multi-GPU / DDP
│   │   ├── optimizers.py             AdamW, SGD, Lion, schedulers
│   │   └── export.py                 ONNX, TorchScript export
│   │
│   ├── inference/
│   │   ├── tiled_inference.py        TiledInference — Gaussian-blend tiling
│   │   ├── postprocessing.py         PostProcessor — smooth, regularize
│   │   ├── ensemble.py               EnsembleInference
│   │   ├── validation.py             InferenceValidator
│   │   └── vectorization.py          Raster → vector conversion
│   │
│   ├── labeling/
│   │   ├── osm_labeler.py            OpenStreetMap polygon labels
│   │   ├── microsoft_buildings.py    Microsoft global footprints
│   │   ├── google_buildings.py       Google Open Buildings
│   │   ├── esa_worldcover.py         ESA WorldCover land cover
│   │   ├── dynamic_world.py          Google Dynamic World
│   │   ├── sam_labeler.py            SAM auto-labeling
│   │   └── foundation_labeler.py     Foundation model labeling
│   │
│   ├── data/
│   │   ├── tiling.py                 GeoTIFF tiling and chip extraction
│   │   ├── dataset.py                TileMetadata, GeoDataset
│   │   ├── dataloader.py             GeoDataLoader (PyGeoFetch → training)
│   │   ├── augmentations.py          Geo-aware augmentations
│   │   ├── preprocessing.py          Normalisation, band selection
│   │   └── sampler.py                Weighted, stratified samplers
│   │
│   ├── pipelines/__init__.py   544   10 end-to-end geospatial pipelines
│   ├── monitoring/__init__.py        DriftDetector, PerformanceTracker
│   └── experiments/__init__.py       ExperimentTracker
│
├── cli/main.py                1,039  Complete CLI (data + ai + pipeline + models)
└── models/__init__.py                Pydantic schemas

Configuration

from pygeovision.core.config import PyGeoVisionConfig

# Load from YAML
config = PyGeoVisionConfig.load("pygeovision.yaml")

# Programmatic
config = PyGeoVisionConfig(
    gpu={"device": "cuda", "mixed_precision": True},
    training={"batch_size": 32, "learning_rate": 5e-5, "max_epochs": 200},
    model_hub={"cache_dir": "/data/models/"},
    pygeofetch={
        "default_providers": ["planetary_computer", "copernicus"],
        "cache_ttl_seconds": 7200,
        "download_parallel": 8,
        "verify_checksum": True,
    },
)

# Save
config.save("~/.pygeovision/config.yaml")

# Get PyGeoFetch config subset
config.as_pygeofetch_config()

Environment variables:

PYGEOVISION_GPU_DEVICE=cuda
PYGEOVISION_GPU_MIXED_PRECISION=true
PYGEOVISION_TRAINING_BATCH_SIZE=32
PYGEOVISION_TRAINING_LR=5e-5
PYGEOVISION_MODEL_HUB_CACHE_DIR=/data/models
PYGEOVISION_LOG_LEVEL=DEBUG

Config search order (highest priority last):

  1. Built-in defaults
  2. ~/.pygeovision/config.yaml
  3. .pygeovision.yaml (project-level)
  4. Environment variables (PYGEOVISION_*)
  5. Constructor arguments

Cache & Status

# System status
status = client.status()
# {
#   "pygeovision_version": "1.0.0",
#   "pygeofetch": {"available": True, "version": "1.0.0", "providers": 22, "open_providers": 10},
#   "geoai": {"available": True, "version": "0.39.2"},
#   "torch": {"version": "2.12.0", "cuda": True, "device": "cuda", "gpu": "NVIDIA A100"},
#   "rasterio": "1.5.0",
#   "geopandas": "1.1.3",
#   "registered_ai_models": 14,
# }

# Cache management (delegates to PyGeoFetch)
client.cache_stats()             # {"entries": 42, "size_mb": 8.7, "location": "..."}
client.clear_cache()             # clear all
client.clear_cache(provider="planetary_computer")
client.clear_cache(older_than="7d")
client.data.set_cache_ttl(7200)  # 2-hour TTL
client.data.prune_cache(max_size_gb=5.0)

# Diagnostics
client.doctor()                  # pygeofetch doctor
client.test_provider("copernicus")

Testing

208 passing  19 skipped  0 failing

tests/
├── test_core.py              14  Config, exceptions, engine
├── test_data_layer.py        68  SatelliteFetcher, SearchResult, DataPipeline, providers
├── test_geoai_integration.py 68  All 24 GeoAI subsystems (mocked)
├── test_validation.py        21  TiledInference, vectorization
├── test_training.py              GeoTrainer, losses, metrics
├── test_labeling.py              7 labelers
├── test_inference.py             PostProcessor, Ensemble
├── test_integration.py           End-to-end pipeline integration
└── conftest.py                   Shared fixtures
# Run the full test suite (no GPU, no real network calls)
cd pygeovision_v2
pip install -e ".[dev]"
pytest tests/ -q                                    # all tests
pytest tests/test_data_layer.py -v                  # data layer only
pytest tests/test_geoai_integration.py -v           # GeoAI subsystems only
pytest tests/ --cov=pygeovision --cov-report=html   # with coverage

Comparing PyGeoVision to Alternatives

Feature PyGeoVision EODAG TorchGeo TerraTorch Raw GeoAI
Data providers 22+ 10+ Limited Limited 3
PyGeoFetch integration ✅ Native
GeoAI integration ✅ Full 24 subsystems ✅ Direct
CLI ✅ Full ✅ Partial
YAML pipelines
Auth / keyring
Parallel downloads
Post-processing chain
End-to-end pipelines 10
SAM / GroundedSAM
Foundation models ✅ (Prithvi, DINOv3)
Automated labeling 7 labelers
ONNX export
Commercial providers ✅ Planet, Maxar, Airbus

Acknowledgements

PyGeoVision is built on top of two exceptional open-source projects:

  • PyGeoFetch — Universal satellite data pipeline by the PyGeoFetch team. PyGeoVision uses PyGeoFetch for all data search, download, authentication, caching, and pipeline orchestration.

  • GeoAI — Artificial Intelligence for Geospatial Data by Qiusheng Wu and contributors. PyGeoVision wraps GeoAI for all AI inference, training, and model management. Published in JOSS 2026.


License

Apache 2.0 — see LICENSE

Copyright © 2026 PyGeoVision Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygeovision-1.0.0.tar.gz (230.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygeovision-1.0.0-py3-none-any.whl (217.7 kB view details)

Uploaded Python 3

File details

Details for the file pygeovision-1.0.0.tar.gz.

File metadata

  • Download URL: pygeovision-1.0.0.tar.gz
  • Upload date:
  • Size: 230.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pygeovision-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3a6197dc4fa7baf61f6c6695436d7277101cc2f780267d690b845441bfadc2f3
MD5 78adf0229e94e3cb54d84ad0c7dc3fc2
BLAKE2b-256 bfb35620c38318a18be9b165f917b30e78912d6b1e48af9a3003738fc1079e4e

See more details on using hashes here.

File details

Details for the file pygeovision-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pygeovision-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 217.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pygeovision-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 057a4842919ecfe515ca27341c11f3d9688c9eb8a7452ef00e8a7b8a2ed07ebe
MD5 bb0c48133ad8d4b38e6cdd91e2a1548e
BLAKE2b-256 21e5f14df2a7d4763ae170b1b98f7d4a67dcacdb28a490fc021e45f282b621cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page