Geospatial image format codecs (NITF, GeoTIFF) for the OversightML ecosystem
Project description
OversightML Imagery IO
Flexible read/write for NITF, GeoTIFF, JPEG 2000, DTED, and more. Performant cloud-native tile access with no complex dependencies. Built in Rust for speed with Python APIs for easy integration with the latest ML frameworks and data science environments.
Why This Library
-
pip installand go — self-contained wheels with a Rust core and bundled codecs. No system libraries, no C toolchain, no conda. Minimal dependencies also mean a smaller attack surface and fewer packages to patch — a real consideration for production and security-sensitive deployments. -
Specification-compliant NITF read/write — supports all four IMODE interleave modes (B, P, R, S) and compression types including uncompressed, JPEG 2000, HTJ2K, and JPEG DCT — with masked variants for sparse imagery. An extensible, data-driven TRE parser ships with definitions for all publicly available Tagged Record Extensions, and Data Extension Segments are first-class objects you can read, create, and modify. SICD and SIDD SAR data are supported through the NITF implementation.
-
Cloud-native tile access that works with compressed data — existing Zarr codecs cannot decode compressed tiles from TIFF or JPEG 2000 files because they lack support for format-specific context like TIFF predictor metadata and JPEG 2000 multi-tile-part reassembly. This library provides custom Zarr v3 codecs that solve these problems, along with VirtualiZarr parsers and a scatter-gather filesystem for non-contiguous byte ranges — making the promise of virtual Zarr access work with real compressed geospatial imagery.
-
Simple when you want it, deep when you need it —
imread/imsave/tilesfor common tasks; full low-level API for format-specific control over segments, metadata, tiling, masks, and compression parameters.
What This Library is Not
This is not a library of image operations or photogrammetry routines — there are no orthorectification pipelines, pan-sharpening filters, or coordinate transforms here. The goal is to get pixels from geospatial imagery formats into a NumPy array as efficiently as possible so you can feed them into your ML framework, image processing toolkit, or analysis pipeline of choice.
Quick Start
pip install osml-imagery-io
from aws.osml.io import imread, imsave, iminfo, tiles
# Inspect metadata without reading pixels
info = iminfo("image.ntf")
print(info.metadata["RPC00B"]) # Rational polynomial coefficients
print(info.metadata["STDIDC"]) # Acquisition context
# Read an image as a NumPy array, careful this is the whole image
pixels = imread("image.ntf") # shape: (bands, height, width)
# Read a single windowed region, much more practical
chip = imread("image.ntf", window=(100, 200, 256, 256)) # (x, y, width, height)
# Save to any supported format — inferred from extension
imsave("output.tif", chip)
# Iterate over tiles for memory-efficient processing of large images
for tile in tiles("large_image.tif", tile_size=(256, 256)):
process(tile.data)
Full Format Access
The convenience functions cover common tasks. When you need to work with the format itself — multi-segment NITF files, TRE fields, block masks, resolution levels — the full API is there.
from aws.osml.io import IO
with IO.open("satellite_scene.ntf", "r") as dataset:
# Navigate all segments in the file
for key in dataset.get_asset_keys():
print(key) # "image:0", "image:1", "text:0", "data:0", ...
image = dataset.get_asset("image:0")
meta = image.metadata.as_dict()
# Rational polynomial coefficients for geopositioning
rpc = meta["RPC00B"]
# TODO: Use coefficients to construct sensor model ...
# Acquisition context — mission, pass, date
stdidc = meta["STDIDC"]
acq_date = stdidc["ACQ_DATE"]
mission = stdidc["MISSION"]
# Exploitation usability — GSD, sun angles, obliquity
use00a = meta["USE00A"]
gsd = use00a["MEAN_GSD"]
sun_el = use00a["SUN_EL"]
# Read a specific block at a reduced resolution level
block = image.get_block(4, 7, resolution_level=2)
The dataset model is inspired by the SpatioTemporal Asset Catalog (STAC) specification — each dataset maps to a STAC Item, assets map to STAC Assets with typed roles, and the structural alignment makes it straightforward to publish datasets as STAC Items.
Format Specific Features
NITF / NSIF
| Capability | Details |
|---|---|
| Versions | NITF 2.1, NSIF 1.0 |
| Compression Options (IC) | NC, NM (uncompressed/masked), C8, M8 (JPEG 2000), CD, MD (HTJ2K), C3, M3, I1 (JPEG DCT) |
| Interleave (IMODE) | B (band), P (pixel), R (row), S (sequential) |
| TRE parser | Data-driven with definitions for all publicly available TREs |
| Data Extensions | Read and write DES payloads (SICD/SIDD XML, TRE overflow, etc.) |
| Pixel types | 8/16/32-bit unsigned, 16/32-bit signed, 32/64-bit float, 32/64-bit complex |
| Block masks | Sparse imagery via masked compression modes (NM, M8, MD, M3) |
| Multi-segment | Multiple image, text, graphic, and data segments per file |
| Multi-file pyramids | R-set resolution pyramids across separate files |
TIFF / GeoTIFF / COG
| Capability | Details |
|---|---|
| Compression Options | Uncompressed, Deflate and LZW, — with horizontal differencing predictor |
| Tiling | Configurable tile dimensions (multiples of 16) |
| GeoKeys | OGC GeoTIFF 1.1 — CRS, pixel scale, tiepoints, affine transforms |
| COG | Cloud Optimized GeoTIFF with overview IFDs and correct NewSubfileType |
| Pixel types | 8/16/32-bit unsigned, 16/32-bit signed, 32/64-bit float |
DTED (Digital Terrain Elevation Data)
| Capability | Details |
|---|---|
| Levels | 0, 1, 2, ... |
| Pixel type | Single-band Int16 (signed-magnitude encoding) |
| Extensions | .dt0–.dt5, .avg, .min, .max |
| Datum | WGS84 horizontal, MSL (EGM96) vertical |
| Zarr codec | Overlap-aware edge trimming for seamless multi-cell mosaics |
Other Formats
JP2, JPEG, and PNG file formats are also supported for read and write. These lack robust metadata, but they appear frequently as interchange formats for tiles and quick-look products alongside NITF and GeoTIFF imagery.
Cloud-Native Access
Access tiles from NITF, TIFF, and JPEG 2000 files in S3 as virtual Zarr arrays — no format conversion needed. Generate a lightweight tile index once, and the Zarr / xarray / Dask ecosystem treats the file as a native chunked array.
The library provides three components that together make this work for real compressed geospatial data:
- VirtualiZarr parsers that scan NITF, TIFF, and JPEG 2000 files to build multi-resolution tile indexes (Kerchunk JSON/Parquet)
- Format-aware Zarr v3 codecs that handle the decoding problems standard codecs cannot — NITF endian swap and interleave conversion, JPEG 2000 tile-part reconstruction from non-contiguous byte ranges, TIFF predictor reversal using metadata from outside the tile data, and DTED boundary-post trimming for seamless multi-cell elevation mosaics without preprocessing
- MultiReferenceFileSystem — an fsspec extension that adds scatter-gather I/O for JPEG 2000 codestreams where a single tile's data is scattered across multiple non-contiguous locations in the file (common with RLCP/RPCL progression orders in satellite imagery)
Multi-resolution pyramids follow the GeoZarr multiscales convention being developed by the OGC GeoZarr Standards Working Group.
from aws.osml.io.virtualizarr_parsers import OversightMLParser, write_tile_index
# Build a tile index (works for NITF, TIFF, JPEG 2000)
parser = OversightMLParser(local_paths="image.ntf")
store = parser(url="s3://my-bucket/imagery/image.ntf")
write_tile_index(store, "image.ntf.tile_index.json")
import zarr
from aws.osml.io.multi_reference_fs import MultiReferenceFileSystem
from zarr.storage._fsspec import FsspecStore
# Open tiles directly from S3 — only fetches the bytes you need
fs = MultiReferenceFileSystem(
fo="s3://my-bucket/imagery/image.ntf.tile_index.json",
asynchronous=True,
remote_options={"asynchronous": True},
skip_instance_cache=True,
)
store = FsspecStore(fs=fs, read_only=True, path="")
root = zarr.open_group(store, mode="r", zarr_format=2)
tile = root["0/data"][0:3, 768:1024, 1024:1280]
See the Cloud Imagery Access guide for the full workflow.
Documentation
Full documentation is published at awslabs.github.io/osml-imagery-io.
- Getting Started — install, read, write, tile in a few lines
- User Guide — datasets, metadata, block access, writing, cloud access
- API Reference — full Python API documentation
- Design Documents — architecture and design decisions
Development
conda env create -f environment.yml
conda activate osml-imagery-io-dev
source scripts/setup-dev-env.sh
maturin develop
pytest
cargo test
See CONTRIBUTING.md for details.
Security
Please do not open a public GitHub issue to report security concerns. Follow the reporting mechanisms described in SECURITY.
License
This project is licensed under the Apache 2.0 License. See the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osml_imagery_io-0.1.0.tar.gz.
File metadata
- Download URL: osml_imagery_io-0.1.0.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2df185c84d135ee387088060a243fd6940304c81c2c08091a1b2e8900cc14e5c
|
|
| MD5 |
2bb0b94f68f287daab09c2b62137e567
|
|
| BLAKE2b-256 |
6b797f4a13d814b44cb9fe8f89232b58d927683c846dfc6360289db25e90df62
|
Provenance
The following attestation bundles were made for osml_imagery_io-0.1.0.tar.gz:
Publisher:
release.yml on awslabs/osml-imagery-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osml_imagery_io-0.1.0.tar.gz -
Subject digest:
2df185c84d135ee387088060a243fd6940304c81c2c08091a1b2e8900cc14e5c - Sigstore transparency entry: 1584880361
- Sigstore integration time:
-
Permalink:
awslabs/osml-imagery-io@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/awslabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a0775716c45a4ad55708642c44fd5061e00ba858a7b9e07c8e191763d88825a
|
|
| MD5 |
364a65caf64d197a18180f1dbc6101b8
|
|
| BLAKE2b-256 |
1ccde1bd826d39743d2085a963120529f144340197f4b77a3ed950c74ddf0753
|
Provenance
The following attestation bundles were made for osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on awslabs/osml-imagery-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
0a0775716c45a4ad55708642c44fd5061e00ba858a7b9e07c8e191763d88825a - Sigstore transparency entry: 1584880598
- Sigstore integration time:
-
Permalink:
awslabs/osml-imagery-io@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/awslabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f68206d2aec931a0c7e16b8a7fa654d5a284a2d627c563abc79adb2892aed26e
|
|
| MD5 |
0c356e3b4b3b54b5d4cf754ffcc86fe7
|
|
| BLAKE2b-256 |
1e90211502686ed44167ea1c11d5fd55dbdf931419c674f357ea1f33b1bcb14c
|
Provenance
The following attestation bundles were made for osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl:
Publisher:
release.yml on awslabs/osml-imagery-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osml_imagery_io-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
f68206d2aec931a0c7e16b8a7fa654d5a284a2d627c563abc79adb2892aed26e - Sigstore transparency entry: 1584880694
- Sigstore integration time:
-
Permalink:
awslabs/osml-imagery-io@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/awslabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osml_imagery_io-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: osml_imagery_io-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77ab52fd27ab39d2c0f1301d3de68f6592ef17fe41a2a237fdf390e2a6c97de2
|
|
| MD5 |
1a5547fbb9ed9693e5edc08522abba43
|
|
| BLAKE2b-256 |
170658dc4364f9ab94c0edc704d42c73916ee9b5c631933afea1eb6317b6c6eb
|
Provenance
The following attestation bundles were made for osml_imagery_io-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on awslabs/osml-imagery-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osml_imagery_io-0.1.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
77ab52fd27ab39d2c0f1301d3de68f6592ef17fe41a2a237fdf390e2a6c97de2 - Sigstore transparency entry: 1584880443
- Sigstore integration time:
-
Permalink:
awslabs/osml-imagery-io@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/awslabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osml_imagery_io-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: osml_imagery_io-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7feb93f2a9453024f750e0b7c9b7f4da0861d50a41556e7750097d78e94d73e
|
|
| MD5 |
4a00c50c740510020790d09fb497b117
|
|
| BLAKE2b-256 |
dc6764c6566ac6f7a5f04f1594bb152ee36d70472bcb8270f1d5005cf3e38dc6
|
Provenance
The following attestation bundles were made for osml_imagery_io-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on awslabs/osml-imagery-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osml_imagery_io-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
e7feb93f2a9453024f750e0b7c9b7f4da0861d50a41556e7750097d78e94d73e - Sigstore transparency entry: 1584880510
- Sigstore integration time:
-
Permalink:
awslabs/osml-imagery-io@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/awslabs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eec77c6e69389ffd7c9aba6050adad970c8dc271 -
Trigger Event:
release
-
Statement type: