Skip to main content

A parser intended for use with VirtualiZarr to create virtual Zarr stores from TIFFs

Project description

Virtual TIFF

A Parser for creating Virtual Zarr stores from TIFF files using VirtualiZarr 2.0 and async-tiff.

Background

First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:

  1. Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
  2. Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
  3. Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/odc-stac, thereby decreasing cost and increasing performance
  4. Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
  5. Include etags in the virtualized datasets to support reproducibility
  6. A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (https://github.com/zarr-developers/geozarr-spec/issues/53)

Getting started

The library can be installed from PyPI:

python -m pip install virtual-tiff

You can use Virtual TIFF to load data directly:

import obstore
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
import xarray as xr

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)
ds.load()

or create a virtual dataset:

import obstore
from virtualizarr import open_virtual_dataset
from virtualizarr.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF

# Configuration
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"

# Setup and open dataset
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})

ds = open_virtual_dataset(
    url=file_url,
    registry=registry,
    parser=VirtualTIFF(ifd=0)
)

Contributing

  1. Clone the repository: git clone https://github.com/virtual-zarr/virtual-tiff.git.
  2. Pull baseline image data from dvc remote pixi run -e test download-test-images WARNING: This will download ~1.4GB of TIFFs for testing to your machine.
  3. Run the test suite using pixi run -e test run-tests WARNING: Some tests will fail due to incomplete status of the implementation.
  4. Start a shell if needed in the development environment using pixi run -e test zsh.

License

virtual-tiff is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtual_tiff-0.2.1.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtual_tiff-0.2.1-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file virtual_tiff-0.2.1.tar.gz.

File metadata

  • Download URL: virtual_tiff-0.2.1.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for virtual_tiff-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8b1303604f2bfdb1fb0ec67371cc73d371ad396cd6b374d3823de0ce02695de0
MD5 15a5995e8640b44e46afef8bf1abeeba
BLAKE2b-256 62e3904de723d99afd626468bd41bfa999e97f717dbd3ae760a3ce64c815b4e9

See more details on using hashes here.

File details

Details for the file virtual_tiff-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: virtual_tiff-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for virtual_tiff-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 500758eda95beab2759fd37f10bf83d0ecdfa2b6eb91b770d01d1ea42eb80c0d
MD5 9968c39d15ce7508395682e6953220a3
BLAKE2b-256 142ea3a6f7b308a3f9a1beb1ec3c9c982b72ca76833667904048e6edcd867c18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page