Skip to main content

A parser intended for use with VirtualiZarr to create virtual Zarr stores from TIFFs

Project description

Virtual TIFF

Turn TIFF and COG archives into Zarr stores without copying any data.

Virtual TIFF emits a VirtualiZarr-compatible Zarr v3 store backed by byte-range references into the original TIFFs. Persist it with Icechunk and you've published a coherent datacube — readable in any language with a Zarr+Icechunk client — without copying any pixels.

What this lets you do:

  • Curate what's exposed. Pick which bands, overviews, and AOIs land in the published store; consumers see one datacube, not hundreds of files.
  • Detect source drift. Icechunk records ETags, so analyses can verify the source TIFFs haven't changed since the manifest was built.
  • Open non-COG TIFFs without rewriting them. Internally tiled TIFFs that aren't quite COG-compliant still get fast cloud-native access through the virtual store.

When to use Virtual TIFF

  • You're building a datacube product over a TIFF/COG archive that should outlive any single Python session.
  • You need non-Python clients (zarrs, zarrita.js, zarr-layer) to read the archive without knowing it's TIFF underneath.
  • You want Icechunk-versioned access to the archive: snapshots, transactions, time-travel as new acquisitions land.
  • The archive is queried many times, and amortizing per-file IFD discovery across all those queries actually matters.
  • You want to expose overviews as a native Zarr multiscale group, so downstream tools (visualization, fast analytics) can use them directly.

Virtual TIFF stitches, it doesn't mosaic. Combining files into a single array requires a structured grid — matching CRS and resolution, or resolution that varies systematically along an axis (e.g. via rectilinear chunking). Heterogeneous TIFFs can still coexist as separate arrays in a DataTree, but you lose the unified-cube benefit. Pixel-level mosaicking and reprojection happen downstream in numpy, dask, or rioxarray — Virtual TIFF doesn't do math.

When not to use Virtual TIFF

If your workflow is "open a STAC search, get an xarray DataArray, do analysis," you probably don't need a virtual store. Reach for one of:

  • lazycogs — STAC + async-geotiff with on-the-fly reprojection, for dynamic queries and heterogeneous-CRS data.
  • stackstac / odc-stac — established STAC-to-DataArray loaders for analyst workflows.
  • async-tiff / async-geotiff directly — when you just want a fast async TIFF reader and don't need a Zarr surface at all.

Virtual TIFF shares the same async-tiff I/O layer as lazycogs and async-geotiff; stackstac and odc-stac sit on rasterio/GDAL instead. The bigger split is what gets produced: a runtime DataArray versus a publishable virtual Zarr store. Pick the one that matches your output.

How it fits

The point of Virtual TIFF is that it's not in the read path. It runs once, when the manifest is built. After that, every consumer goes straight from their Zarr client to the manifest to the TIFF byte ranges.

Build-time (once, by the data publisher)

   TIFFs / COGs in S3, GCS, Azure, …
              │
              │  byte-range GETs for IFD metadata
              ▼
   async-tiff + obstore
              │
              ▼
   Virtual TIFF  ── VirtualiZarr parser, run once
              │
              ▼
   manifest committed to an Icechunk repo

Read-time (every time, in any session)

   Zarr v3 client  +  Icechunk store driver
   (e.g. zarr-python + icechunk-python,
         zarrs + icechunk-rs, …)
              │
              │  Zarr reads issued through the Icechunk Store
              ▼
   Icechunk repo  ── snapshot + manifest
              │
              │  Icechunk resolves chunk keys to
              │  (file_url, offset, length) per chunk
              ▼
   TIFFs / COGs in S3, GCS, Azure, …
              │
              │  parallel byte-range GETs
              ▼
   decoded chunks via the Zarr codec pipeline

Note the absence of virtual-tiff and async-tiff from the read-time path. They're build-time tools; once the manifest exists, consumers reach the source bytes through Icechunk alone.

Quick start

python -m pip install virtual-tiff

Open a single TIFF as a Zarr-backed xarray dataset

import obstore
import xarray as xr
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF

bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)

Works equally for GCS, Azure, or any obstore-supported backend — swap the store factory.

Build a virtual dataset for use with VirtualiZarr

from virtualizarr import open_virtual_dataset
from virtual_tiff import VirtualTIFF

ds = open_virtual_dataset(
    url=file_url,
    registry=registry,
    parser=VirtualTIFF(ifd=0),
)

What's supported

TIFF feature Supported Notes
Strips Image height must be evenly divisible by rows-per-strip
Tiles
Multiple IFDs
Nested pages / IFDs
Compressions: Uncompressed, PackBits, Zlib, LZMA, Lerc, PNG, Deflate, LZW, JPEGXL, JPEG8, WebP
JPEG Quantization tables (the JPEGTables tag) are not yet supported, which excludes nearly all JPEG-encoded TIFFs in practice.
CMYK
YCbCr / CIE L*a*b* / Palette-color
Grayscale, RGB
PlanarConfiguration (chunky and planar)
Both byte orders (II & MM)
BigTIFF (64-bit offsets)

Contributing

  1. git clone https://github.com/virtual-zarr/virtual-tiff.git
  2. pixi run -e test download-test-images (downloads ~1.4 GB of test TIFFs)
  3. pixi run -e test run-tests — note: some tests are expected to fail while the implementation is in progress.
  4. pixi run -e test zsh for a dev shell.

Test data is populated from three upstream sources via sync scripts:

  • uv run scripts/sync_gdal_tiffs.py — GDAL autotest TIFFs
  • uv run scripts/sync_external_tiffs.py — external TIFFs from various URLs
  • uv run scripts/sync_geotiff_test_data.py — fixtures from geotiff-test-data

License

virtual-tiff is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_tiff-0.5.0.tar.gz (138.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_tiff-0.5.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file virtual_tiff-0.5.0.tar.gz.

File metadata

  • Download URL: virtual_tiff-0.5.0.tar.gz
  • Upload date:
  • Size: 138.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for virtual_tiff-0.5.0.tar.gz
Algorithm Hash digest
SHA256 90ac9da18fbfe8b93d76ea5f878bc5fc39f3329de8d793f68de02d5c3fbc0cc7
MD5 140ea2ee9d8a4ff9456c541a6b37b6e5
BLAKE2b-256 f7af23a58c44abd2aa83614033f4a519eea036f0d84cea54e7e16643e7079faf

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.5.0.tar.gz:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file virtual_tiff-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: virtual_tiff-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for virtual_tiff-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb70feab1124e4b6b99bbc1f87f68cf062d2601ad4d11bccaeca1a0e39c80a8f
MD5 278ef9935549ec9e2f01c09d7bf9f27f
BLAKE2b-256 c912a8aecc40296cb30e8011e1bb50c9916bb3b46a8bef89e3f3223cc9715477

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.5.0-py3-none-any.whl:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page