A parser intended for use with VirtualiZarr to create virtual Zarr stores from TIFFs
Project description
Virtual TIFF
Turn TIFF and COG archives into Zarr stores without copying any data.
Virtual TIFF emits a VirtualiZarr-compatible Zarr v3 store backed by byte-range references into the original TIFFs. Persist it with Icechunk and you've published a coherent datacube — readable in any language with a Zarr+Icechunk client — without copying any pixels.
What this lets you do:
- Curate what's exposed. Pick which bands, overviews, and AOIs land in the published store; consumers see one datacube, not hundreds of files.
- Detect source drift. Icechunk records
ETags, so analyses can verify the source TIFFs haven't changed since the manifest was built. - Open non-COG TIFFs without rewriting them. Internally tiled TIFFs that aren't quite COG-compliant still get fast cloud-native access through the virtual store.
When to use Virtual TIFF
- You're building a datacube product over a TIFF/COG archive that should outlive any single Python session.
- You need non-Python clients (zarrs, zarrita.js, zarr-layer) to read the archive without knowing it's TIFF underneath.
- You want Icechunk-versioned access to the archive: snapshots, transactions, time-travel as new acquisitions land.
- The archive is queried many times, and amortizing per-file IFD discovery across all those queries actually matters.
- You want to expose overviews as a native Zarr multiscale group, so downstream tools (visualization, fast analytics) can use them directly.
Virtual TIFF stitches, it doesn't mosaic. Combining files into a single array requires a structured grid — matching CRS and resolution, or resolution that varies systematically along an axis (e.g. via rectilinear chunking). Heterogeneous TIFFs can still coexist as separate arrays in a DataTree, but you lose the unified-cube benefit. Pixel-level mosaicking and reprojection happen downstream in numpy, dask, or rioxarray — Virtual TIFF doesn't do math.
When not to use Virtual TIFF
If your workflow is "open a STAC search, get an xarray DataArray, do analysis," you probably don't need a virtual store. Reach for one of:
- lazycogs — STAC + async-geotiff with on-the-fly reprojection, for dynamic queries and heterogeneous-CRS data.
- stackstac / odc-stac — established STAC-to-DataArray loaders for analyst workflows.
- async-tiff / async-geotiff directly — when you just want a fast async TIFF reader and don't need a Zarr surface at all.
Virtual TIFF shares the same async-tiff I/O layer as lazycogs and async-geotiff; stackstac and odc-stac sit on rasterio/GDAL instead. The bigger split is what gets produced: a runtime DataArray versus a publishable virtual Zarr store. Pick the one that matches your output.
How it fits
The point of Virtual TIFF is that it's not in the read path. It runs once, when the manifest is built. After that, every consumer goes straight from their Zarr client to the manifest to the TIFF byte ranges.
Build-time (once, by the data publisher)
TIFFs / COGs in S3, GCS, Azure, …
│
│ byte-range GETs for IFD metadata
▼
async-tiff + obstore
│
▼
Virtual TIFF ── VirtualiZarr parser, run once
│
▼
manifest committed to an Icechunk repo
Read-time (every time, in any session)
Zarr v3 client + Icechunk store driver
(e.g. zarr-python + icechunk-python,
zarrs + icechunk-rs, …)
│
│ Zarr reads issued through the Icechunk Store
▼
Icechunk repo ── snapshot + manifest
│
│ Icechunk resolves chunk keys to
│ (file_url, offset, length) per chunk
▼
TIFFs / COGs in S3, GCS, Azure, …
│
│ parallel byte-range GETs
▼
decoded chunks via the Zarr codec pipeline
Note the absence of virtual-tiff and async-tiff from the read-time path. They're build-time tools; once the manifest exists, consumers reach the source bytes through Icechunk alone.
Quick start
python -m pip install virtual-tiff
Open a single TIFF as a Zarr-backed xarray dataset
import obstore
import xarray as xr
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})
parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
Works equally for GCS, Azure, or any obstore-supported backend — swap the store factory.
Build a virtual dataset for use with VirtualiZarr
from virtualizarr import open_virtual_dataset
from virtual_tiff import VirtualTIFF
ds = open_virtual_dataset(
url=file_url,
registry=registry,
parser=VirtualTIFF(ifd=0),
)
What's supported
| TIFF feature | Supported | Notes |
|---|---|---|
| Strips | ✅ | Image height must be evenly divisible by rows-per-strip |
| Tiles | ✅ | |
| Multiple IFDs | ✅ | |
| Nested pages / IFDs | ❌ | |
| Compressions: Uncompressed, PackBits, Zlib, LZMA, Lerc, PNG, Deflate, LZW, JPEGXL, JPEG8, WebP | ✅ | |
| JPEG | ❌ | Quantization tables (the JPEGTables tag) are not yet supported, which excludes nearly all JPEG-encoded TIFFs in practice. |
| CMYK | ✅ | |
| YCbCr / CIE L*a*b* / Palette-color | ❌ | |
| Grayscale, RGB | ✅ | |
| PlanarConfiguration (chunky and planar) | ✅ | |
| Both byte orders (II & MM) | ✅ | |
| BigTIFF (64-bit offsets) | ✅ |
Contributing
git clone https://github.com/virtual-zarr/virtual-tiff.gitpixi run -e test download-test-images(downloads ~1.4 GB of test TIFFs)pixi run -e test run-tests— note: some tests are expected to fail while the implementation is in progress.pixi run -e test zshfor a dev shell.
Test data is populated from three upstream sources via sync scripts:
uv run scripts/sync_gdal_tiffs.py— GDAL autotest TIFFsuv run scripts/sync_external_tiffs.py— external TIFFs from various URLsuv run scripts/sync_geotiff_test_data.py— fixtures from geotiff-test-data
License
virtual-tiff is distributed under the terms of the
MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtual_tiff-0.5.0.tar.gz.
File metadata
- Download URL: virtual_tiff-0.5.0.tar.gz
- Upload date:
- Size: 138.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90ac9da18fbfe8b93d76ea5f878bc5fc39f3329de8d793f68de02d5c3fbc0cc7
|
|
| MD5 |
140ea2ee9d8a4ff9456c541a6b37b6e5
|
|
| BLAKE2b-256 |
f7af23a58c44abd2aa83614033f4a519eea036f0d84cea54e7e16643e7079faf
|
Provenance
The following attestation bundles were made for virtual_tiff-0.5.0.tar.gz:
Publisher:
release.yml on virtual-zarr/virtual-tiff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
virtual_tiff-0.5.0.tar.gz -
Subject digest:
90ac9da18fbfe8b93d76ea5f878bc5fc39f3329de8d793f68de02d5c3fbc0cc7 - Sigstore transparency entry: 1405562597
- Sigstore integration time:
-
Permalink:
virtual-zarr/virtual-tiff@9d952019cc59432af86de827116365d03f0dddcf -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/virtual-zarr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9d952019cc59432af86de827116365d03f0dddcf -
Trigger Event:
push
-
Statement type:
File details
Details for the file virtual_tiff-0.5.0-py3-none-any.whl.
File metadata
- Download URL: virtual_tiff-0.5.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb70feab1124e4b6b99bbc1f87f68cf062d2601ad4d11bccaeca1a0e39c80a8f
|
|
| MD5 |
278ef9935549ec9e2f01c09d7bf9f27f
|
|
| BLAKE2b-256 |
c912a8aecc40296cb30e8011e1bb50c9916bb3b46a8bef89e3f3223cc9715477
|
Provenance
The following attestation bundles were made for virtual_tiff-0.5.0-py3-none-any.whl:
Publisher:
release.yml on virtual-zarr/virtual-tiff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
virtual_tiff-0.5.0-py3-none-any.whl -
Subject digest:
eb70feab1124e4b6b99bbc1f87f68cf062d2601ad4d11bccaeca1a0e39c80a8f - Sigstore transparency entry: 1405562656
- Sigstore integration time:
-
Permalink:
virtual-zarr/virtual-tiff@9d952019cc59432af86de827116365d03f0dddcf -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/virtual-zarr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9d952019cc59432af86de827116365d03f0dddcf -
Trigger Event:
push
-
Statement type: