Skip to main content

A parser intended for use with VirtualiZarr to create virtual Zarr stores from TIFFs

Project description

Virtual TIFF

A Parser for creating Virtual Zarr stores from TIFF files using VirtualiZarr 2.0 and async-tiff.

Background

First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:

  1. Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
  2. Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
  3. Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/odc-stac, thereby decreasing cost and increasing performance
  4. Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
  5. Include etags in the virtualized datasets to support reproducibility
  6. A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (https://github.com/zarr-developers/geozarr-spec/issues/53)

Getting started

The library can be installed from PyPI:

python -m pip install virtual-tiff

You can use Virtual TIFF to load data directly:

import obstore
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
import xarray as xr

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
ds.load()

It also works with Google Cloud Storage:

import obstore
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
import xarray as xr

# Configuration
bucket_url = "gs://gcp-public-data-landsat/"
file_url = f"{bucket_url}LC08/01/044/034/LC08_L1TP_044034_20131228_20170307_01_T1/LC08_L1TP_044034_20131228_20170307_01_T1_B3.TIF"

# Setup and open dataset
gcs_store = obstore.store.from_url(bucket_url, skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: gcs_store})

parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
ds.load()

or create a virtual dataset:

import obstore
from virtualizarr import open_virtual_dataset
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

ds = open_virtual_dataset(
    url=file_url,
    registry=registry,
    parser=VirtualTIFF(ifd=0)
)

Contributing

  1. Clone the repository: git clone https://github.com/virtual-zarr/virtual-tiff.git.
  2. Download test data from S3: pixi run -e test download-test-images. WARNING: This will download ~1.4GB of TIFFs for testing to your machine.
  3. Run the test suite using pixi run -e test run-tests WARNING: Some tests will fail due to incomplete status of the implementation.
  4. Start a shell if needed in the development environment using pixi run -e test zsh.

Test data is populated from three upstream sources via sync scripts (run these before upload_test_data.py to refresh S3):

  • uv run scripts/sync_gdal_tiffs.py — GDAL autotest TIFFs
  • uv run scripts/sync_external_tiffs.py — external TIFFs from various URLs
  • uv run scripts/sync_geotiff_test_data.py — synthetic and real-world fixtures from geotiff-test-data

License

virtual-tiff is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_tiff-0.4.0.tar.gz (138.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_tiff-0.4.0-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file virtual_tiff-0.4.0.tar.gz.

File metadata

  • Download URL: virtual_tiff-0.4.0.tar.gz
  • Upload date:
  • Size: 138.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for virtual_tiff-0.4.0.tar.gz
Algorithm Hash digest
SHA256 65b094fefae1564217328dd9828bb2085b1a1942cc895b024c9f2aa8e89e79de
MD5 3caee8bee6bb450866323336da151082
BLAKE2b-256 d165a5ee9741a00f26b9fa454092d672aeba0ee5d2b2d02668f50e93d9b19f11

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.4.0.tar.gz:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file virtual_tiff-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: virtual_tiff-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for virtual_tiff-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 730638600a06f621fe5682859ab4f76449a9c86fc2521345a2f247bf2032c90c
MD5 27ca6f2f645cbfb461d5f9649b64d767
BLAKE2b-256 d013c07f8278761893ebac1d11eb1dc7f3d5534785fca66559975044e945cb33

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.4.0-py3-none-any.whl:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page