Skip to main content

A parser intended for use with VirtualiZarr to create virtual Zarr stores from TIFFs

Project description

Virtual TIFF

A Parser for creating Virtual Zarr stores from TIFF files using VirtualiZarr 2.0 and async-tiff.

Background

First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:

  1. Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
  2. Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
  3. Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/odc-stac, thereby decreasing cost and increasing performance
  4. Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
  5. Include etags in the virtualized datasets to support reproducibility
  6. A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (https://github.com/zarr-developers/geozarr-spec/issues/53)

Getting started

The library can be installed from PyPI:

python -m pip install virtual-tiff

You can use Virtual TIFF to load data directly:

import obstore
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
import xarray as xr

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
ds.load()

or create a virtual dataset:

import obstore
from virtualizarr import open_virtual_dataset
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

ds = open_virtual_dataset(
    url=file_url,
    registry=registry,
    parser=VirtualTIFF(ifd=0)
)

Contributing

  1. Clone the repository: git clone https://github.com/virtual-zarr/virtual-tiff.git.
  2. Download test data from S3: pixi run -e test download-test-images. WARNING: This will download ~1.4GB of TIFFs for testing to your machine.
  3. Run the test suite using pixi run -e test run-tests WARNING: Some tests will fail due to incomplete status of the implementation.
  4. Start a shell if needed in the development environment using pixi run -e test zsh.

Test data is populated from three upstream sources via sync scripts (run these before upload_test_data.py to refresh S3):

  • uv run scripts/sync_gdal_tiffs.py — GDAL autotest TIFFs
  • uv run scripts/sync_external_tiffs.py — external TIFFs from various URLs
  • uv run scripts/sync_geotiff_test_data.py — synthetic and real-world fixtures from geotiff-test-data

License

virtual-tiff is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_tiff-0.3.0.tar.gz (128.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_tiff-0.3.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file virtual_tiff-0.3.0.tar.gz.

File metadata

  • Download URL: virtual_tiff-0.3.0.tar.gz
  • Upload date:
  • Size: 128.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for virtual_tiff-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a821ace59dfed3d8ad57390fc2ac2a56be8062aae2ee2416c60b0d8c523301c5
MD5 59299cecf34ccee27274524dff0d58bd
BLAKE2b-256 df70115d9e8812635165011176fc87ddfeefcca5ad1305944c99c2bf51f859dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.3.0.tar.gz:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file virtual_tiff-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: virtual_tiff-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for virtual_tiff-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f4519d4d0dd156d3e17fd724dcd2370a848e2bcb46825186f34834ecd7686de
MD5 fb0fb00d45dfbaedcf8197ac930c1ad3
BLAKE2b-256 03791827b03cfb4f470ff0ab693148fb7b783cb4596c1e35f5b618cc3aa949f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for virtual_tiff-0.3.0-py3-none-any.whl:

Publisher: release.yml on virtual-zarr/virtual-tiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page