Python library interface to the Tessera geofoundation model embeddings
Project description
GeoTessera
Python library interface to the Tessera geofoundation model embeddings.
Overview
GeoTessera provides access to geospatial embeddings from the Tessera foundation model, which processes Sentinel-1 and Sentinel-2 satellite imagery to generate 128-channel representation maps at 10m resolution. The embeddings compress a full year of temporal-spectral features into useful representations for geospatial analysis tasks.
Data Coverage
Features
- Flexible Registry Sources: Use local registry files, remote downloads, or auto-cloned repositories
- Efficient Block-Based Loading: Lazy loading of registry data using 5×5 degree geographic blocks
- Environment Variable Support: Configure registry location via
TESSERA_REGISTRY_DIR - Auto-Updating Manifests: Automatically clone and update registry manifests from GitHub
- Geographic Data Access: Download geospatial embeddings for specific coordinates
- Comprehensive CLI: List tiles, create visualizations, serve interactive maps
- Built-in Caching: Efficient local caching of downloaded files with Pooch
- Registry Management Tools: Generate and maintain registry files for data maintainers
- Robust Error Handling: Fatal errors for missing tiles ensure data integrity
Installation
pip install git+https://github.com/ucam-eo/geotessera
Configuration
GeoTessera supports multiple configuration options for both data caching and registry management.
Data Cache Configuration
Files are cached in the system's default cache directory (~/.cache/geotessera on Unix-like systems). You can customize this:
# Set custom cache directory
export TESSERA_DATA_DIR=/path/to/your/cache/directory
# Or set for a single command
TESSERA_DATA_DIR=/tmp/tessera geotessera info
Registry Configuration
GeoTessera can load registry files from multiple sources, checked in this priority order:
- Explicit registry directory (via
--registry-diror constructor parameter) - Environment variable (
TESSERA_REGISTRY_DIR) - Auto-cloned manifests repository (default behavior)
Using Environment Variable
# Point to local registry directory
export TESSERA_REGISTRY_DIR=/path/to/tessera-manifests
# Use with any command
geotessera info
Using Auto-Cloned Repository
By default, GeoTessera automatically clones the tessera-manifests repository to your cache directory. This can be configured:
from geotessera import GeoTessera
# Use default auto-cloning
tessera = GeoTessera()
# Auto-update to latest manifests
tessera = GeoTessera(auto_update=True)
# Use custom manifest repository
tessera = GeoTessera(
manifests_repo_url="https://github.com/your-org/custom-manifests.git"
)
Usage
Command Line Interface
All CLI commands support registry configuration options:
# Global options available for all commands:
# --registry-dir PATH Use local registry directory
# --auto-update Update manifests to latest version
# --manifests-repo-url URL Custom manifests repository URL
Basic Commands
Make this uv --from git+https://github.com/ucam-eo/geotessera@main if you don't want to clone this repository.
# List available embeddings
uvx geotessera list-embeddings --limit 10
# Show dataset information
uvx geotessera info
# Generate world map showing embedding coverage
uvx geotessera map --output coverage_map.png
Using Local Registry
# Use local registry directory
uvx geotessera \
--registry-dir /path/to/tessera-manifests info
# Auto-update manifests before use
uvx geotessera \
--auto-update info
Visualization Commands
# Create false-color visualization for a region
uvx geotessera visualize \
--region example/CB.geojson --output cambridge_viz.tiff
# Serve interactive web map
uvx geotessera serve --region example/CB.geojson --open
# Serve with custom band selection (e.g., bands 30, 60, 90)
uvx geotessera serve \
--region example/CB.geojson --bands 30 60 90 --open
If you have the repository checked out, use --from . instead.
Python API
from geotessera import GeoTessera
# Initialize with default settings (auto-clone manifests)
tessera = GeoTessera()
# Use local registry directory
tessera = GeoTessera(
version="v1",
registry_dir="/path/to/tessera-manifests"
)
# Auto-update manifests to latest version
tessera = GeoTessera(
version="v1",
auto_update=True
)
# Use custom manifests repository
tessera = GeoTessera(
version="v1",
manifests_repo_url="https://github.com/your-org/custom-manifests.git"
)
# Download and get dequantized embedding for specific coordinates
embedding = tessera.fetch_embedding(lat=52.05, lon=0.15, year=2024)
print(f"Embedding shape: {embedding.shape}") # (height, width, 128)
# List available embeddings for exploration
for year, lat, lon in tessera.list_available_embeddings():
print(f"Year {year}: ({lat:.2f}, {lon:.2f})")
# Get available years
years = tessera.get_available_years()
print(f"Available years: {years}")
# Find tiles that intersect with a geometry
from shapely.geometry import box
geometry = box(-0.2, 51.9, 0.3, 52.3) # Cambridge area
tiles = tessera.find_tiles_for_geometry(geometry, year=2024)
print(f"Found {len(tiles)} tiles for the geometry")
# Extract embeddings at specific points
points = [(52.2053, 0.1218), (52.1951, 0.1313)] # Cambridge locations
embeddings_df = tessera.extract_points(points, year=2024, include_coords=True)
print(f"Extracted embeddings shape: {embeddings_df.shape}")
# Get tile metadata
bounds = tessera.get_tile_bounds(lat=52.05, lon=0.15)
crs = tessera.get_tile_crs(lat=52.05, lon=0.15)
transform = tessera.get_tile_transform(lat=52.05, lon=0.15)
print(f"Tile bounds: {bounds}")
print(f"Tile CRS: {crs}")
# Export single tile as GeoTIFF
tessera.export_single_tile_as_tiff(
lat=52.05, lon=0.15,
output_path="tile.tiff",
year=2024,
bands=[0, 1, 2] # RGB visualization
)
# Merge embeddings for a region
bounds = (-0.2, 51.9, 0.3, 52.3) # (west, south, east, north)
tessera.merge_embeddings_for_region(
bounds=bounds,
output_path="region.tiff",
target_crs="EPSG:4326",
bands=[0, 1, 2],
year=2024
)
Registry Architecture
GeoTessera uses a block-based registry system for efficient data access to the needed tiles:
Block-Based Organization
- 5×5 degree geographic blocks: Registry files are organized into blocks to enable lazy loading
- Embeddings:
embeddings_YYYY_lonX_latY.txt(e.g.,embeddings_2024_lon-5_lat50.txt) - Landmasks:
landmasks_lonX_latY.txt(e.g.,landmasks_lon0_lat50.txt) - Lazy Loading: Only loads registry files for geographic regions being accessed
Registry Directory Structure
tessera-manifests/
├── registry/
│ ├── embeddings/
│ │ ├── embeddings_2024_lon-180_lat-90.txt
│ │ ├── embeddings_2024_lon-175_lat-90.txt
│ │ └── ...
│ └── landmasks/
│ ├── landmasks_lon-180_lat-90.txt
│ ├── landmasks_lon-175_lat-90.txt
│ └── ...
Registry File Format
Registry files use Pooch-compatible format for data integrity:
# Format: filepath checksum
v1/2024/grid_0.15_52.05/embedding.npy 2a1c8d7e9f3b5a6c8e7d9f2a1c8d7e9f3b5a6c8e7d9f2a1c8d7e9f3b5a6c8e7d
v1/2024/grid_0.15_52.05/embedding_scales.npy 5f9e2d1a8c6b9e3f7d2a8c6b9e3f7d2a8c6b9e3f7d2a8c6b9e3f7d2a8c6b9
Registry Management (Data Maintainers)
GeoTessera includes tools for generating and maintaining registry files. The registry system uses block-based organization for efficient access to large datasets.
Registry Generation Workflow
# 1. Generate SHA256 checksums for data files
uvx --from git+https://github.com/ucam-eo/geotessera@main geotessera-registry hash /path/to/v1
# 2. Scan checksums and create block-based pooch registries
uvx --from git+https://github.com/ucam-eo/geotessera@main geotessera-registry scan /path/to/v1
# 3. List generated registry files
uvx --from git+https://github.com/ucam-eo/geotessera@main geotessera-registry list /path/to/v1
Expected Data Structure
The registry tools expect this directory structure:
v1/
├── global_0.1_degree_representation/ # Embedding .npy files by year
│ ├── 2024/
│ │ ├── grid_0.15_52.05/
│ │ │ ├── embedding.npy
│ │ │ ├── embedding_scales.npy
│ │ │ └── SHA256 # Generated by hash command
│ │ └── ...
│ └── ...
└── global_0.1_degree_tiff_all/ # Landmask .tiff files
├── grid_0.15_52.05.tiff
├── SHA256SUM # Generated by hash command
└── ...
Registry Commands
# Generate SHA256 checksums (parallel processing)
geotessera-registry hash /path/to/data
# - Creates SHA256 files in each grid subdirectory
# - Creates SHA256SUM file for TIFF files using chunked processing
# Generate block-based pooch registries from checksums
geotessera-registry scan /path/to/data [--registry-dir /output/path]
# - Reads SHA256 files and creates registry/embeddings/ files
# - Reads SHA256SUM and creates registry/landmasks/ files
# - Organizes into 5×5 degree blocks for efficient loading
# List existing registry files with entry counts
geotessera-registry list /path/to/registry
Error Handling
GeoTessera now provides robust error handling to ensure data integrity:
- Fatal Errors: Missing embedding tiles throw exceptions instead of silent warnings
- Registry Validation: Missing registry files cause fatal errors during initialization
- Checksum Verification: All downloaded files are verified against SHA256 checksums
- Clear Error Messages: Descriptive errors help diagnose configuration issues
About Tessera
Tessera is a foundation model for Earth observation developed by the University of Cambridge. It learns temporal-spectral features from multi-source satellite data to enable advanced geospatial analysis including land classification and canopy height prediction.
For more information about the Tessera project, visit: https://github.com/ucam-eo/tessera
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geotessera-0.2.0.tar.gz.
File metadata
- Download URL: geotessera-0.2.0.tar.gz
- Upload date:
- Size: 59.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f570f7acf7515b4f1721591b41346824788d397e58f0990de9c5931fba613f0
|
|
| MD5 |
82302ac3f56781603b5c6fd29527b5b7
|
|
| BLAKE2b-256 |
92e91bd4fab45926a23fcb3e45fdff9d07d49ccb8f91edb2c322f92504efbd2a
|
File details
Details for the file geotessera-0.2.0-py3-none-any.whl.
File metadata
- Download URL: geotessera-0.2.0-py3-none-any.whl
- Upload date:
- Size: 60.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adb97626395c69a966a91362ec3783db4cf638913b39044fae32bfbd7fc69f74
|
|
| MD5 |
b96c4f67cc226b88737afef2996684a9
|
|
| BLAKE2b-256 |
aeecc2f369417c94a4d73b168264179e94a841f8877300f7d72f2e842882df69
|