Skip to main content

Python Package that support the read/write of multidimensional GeoTIFF files

Project description

mrio

mrio is a library for reading and writing multidimensional GeoTIFF files, extending rasterio to support multidimensional arrays.

What is a Multidimensional GeoTIFF?

A Multidimensional Geo Tag Image File Format (mGeoTIFF) extends the traditional GeoTIFF format by supporting N-dimensional arrays, similar to formats like NetCDF, HDF5, or Zarr. It maintains the simplicity and compatibility of GeoTIFF, offering fast access and the ability to be opened by any GIS software or library that supports the GeoTIFF format.

What is a Temporal GeoTTIFF?

The Temporal GeoTIFF builds upon the mGeoTIFF by implementing a stricter convention for defining the temporal dimension. It requires a four-dimensional structure with dimensions ordered as follows: (time, band, x, y). The temporal dimension adheres to the STAC specification, which includes a start_datetime and a optional end_datetime. The start_datetime and end_datetime are both strings representing according to RFC 3339, section 5.6, and utilizes the Gregorian calendar as the reference system for time. For additional details, please refer to the [Specification][SPECIFICATION.md].

When to use it?

Multidimensional or Temporal GeoTIFF (mGeoTIFF or GeoTIFF)

Perfect for small datacubes (< 1 GB).

NetCDF, HDF5, or Zarr

Best for larger or complex datacubes (> 1 GB).

Feature HDF5 Zarr mGeoTIFF Explanation
Portability 2 3 1 mGeoTIFF is widely supported and easily shareable.
Compression 2 2 1 Advanced compression options thanks to GDAL.
Coordinates definition 3 3 1 Strict spatial and temporal definitions.
GIS compatibility 2 2 1 Because a mGeoTIFF is still a GeoTIFF.
Chunking 1 1 3 mGeoTIFF only supports spatial chunking.
Nested Groups 1 1 3 mGeoTIFF does not support hierarchical data organization.
Diverse Array Shapes Support 1 1 3 mGeoTIFF only supports regular arrays.
Rich Metadata 1 1 2 mGeoTIFF only supports JSON-encoded metadata.
*1: Excellent, 2: Okay, 3: Poor*

Proof of Concept

We successfully transformed a massive 4 Terabyte dataset of 200,000 ZARR files into a streamlined, cloud-native 700 GB dataset of just few files. This optimization was achieved using TACO and Temporal GeoTIFFs.

🌟 Explore the dataset here!

Installation

pip install mrio

How to use it?

Writing a Multidimensional GeoTIFF

import mrio
import numpy as np
import xarray as xr
import pandas as pd

# 1. Define a 5-dimensional dataset (toy example)
data = np.random.rand(3, 5, 7, 128, 128)
time = list(pd.date_range("2021-01-01", periods=5).strftime("%Y%m%d"))
bands = ["B02", "B03", "B04", "B08", "B11", "B12", "B8A"]
simulations = ["historical", "rcp45", "rcp85"]

datacube = xr.DataArray(
    data=data,
    dims=["simulation", "time", "band", "lat", "lon"],
    coords={
        "simulation": simulations,
        "time": time,
        "band": bands,
    },
)

# 2. Define the parameters
params = {
    "driver": "GTiff",
    "dtype": "float32",
    "width": datacube.shape[-1],
    "height": datacube.shape[-2],
    "interleave": "pixel",
    "crs": "EPSG:4326",
    "transform": mrio.transform.from_bounds(-76.2, 4.31, -76.1, 4.32, 128, 128),
    "md:pattern": "simulation time band lat lon -> (simulation band time) lat lon",
    "md:coordinates": {
        "time": time,
        "band": bands,
        "simulation": simulations,
    },
    "md:attributes": {  # OPTIONAL: additional attributes to include in the file
        "hello": "world",
    },
}

# 3. Write the data
with mrio.open("image.tif", mode="w", **params) as src:
    src.write(datacube.values)

# 4. Read the data
with mrio.open("image.tif") as src:
    data_r = src.read()

# 5. Convert the data back to an xarray DataArray
datacube_r = xr.DataArray(
    data=data_r,
    dims=src.md_meta["md:dimensions"],
    coords=src.md_meta["md:coordinates"],
    attrs=src.md_meta["md:attributes"]
)

# 6. Assert that the data is the same
assert np.allclose(datacube, datacube_r)

Writing a Temporal GeoTIFF

A reproducible and real-world example of a Temporal GeoTIFF: using this format reduces file size by 60% compared to storing different time steps in separate files.

from datetime import datetime
import pandas as pd
import tacoreader
import numpy as np
import mrio

# 1. Load a dataset that contains images from the same location
cloudsen12_l1c = tacoreader.load("tacofoundation:cloudsen12-l1c")

# 2. Select a any location
roi_idx = "ROI_00103"
s2_roi_images = cloudsen12_l1c[cloudsen12_l1c["roi_id"] == roi_idx]
s2_roi_images.reset_index(drop=True, inplace=True)

# 3. Load the temporal mini-cube
temporal_stack = np.zeros((len(s2_roi_images), 3, 512, 512), dtype=np.uint16)
for index, row in s2_roi_images.iterrows():
    
    print("Dowloading image %s" % row["tortilla:id"])
    
    # Get the snipet of the Sentinel-2 image
    s2_str = row.read(0).read(0)

    # Load the image
    with mrio.open(s2_str) as src:
        s2_image = src.read([4, 3, 2])

    # Store the image in the temporal stack
    temporal_stack[index] = s2_image
    
# 4. Set the Coordinates
time_coord = [datetime.fromtimestamp(t).strftime('%Y%m%d') for t in s2_roi_images["stac:time_start"].tolist()]
band_coord = ["B04", "B03", "B02"]

# 5. Set the writing parameters
params = {
    'driver': 'GTiff',
    'dtype': 'uint16',
    'width': temporal_stack.shape[-1],
    'height': temporal_stack.shape[-2],
    'compress': 'zstd',
    'zstd_level': 22,
    'predictor': 2,
    'tiled': True,
    'num_threads': 10,
    'interleave': 'pixel',
    'crs': row["stac:crs"].iloc[0],
    'transform': mrio.Affine.from_gdal(*row["stac:geotransform"].iloc[0]),
    'mrio:pattern': 'time band lat lon -> (band time) lat lon',
    'mrio:coordinates': {
        "time": time_coord,
        "band": band_coord
    },
    'mrio:attributes': {
        'time_start': s2_roi_images["stac:time_start"].tolist(),
        'time_end': s2_roi_images["stac:time_end"].tolist(),
    }
}

# 6. Write the temporal stack
with mrio.open("temporal_stack.tif", mode = "w", **params) as src:
    src.write(temporal_stack)

# 7. Read the data
with mrio.open("temporal_stack.tif") as src:
    data_r = src.read()
    attributes = src.attributes()
    temporal_stack = xr.DataArray(
        data=data_r,
        dims=attributes["mrio:dimensions"],
        coords=attributes["mrio:coordinates"],
        attrs=attributes["mrio:attributes"]
    )

Documentation

See documentation for more details: https://tacofoundation.github.io/mrio/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changes

See the CHANGELOG file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrio-0.0.6.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mrio-0.0.6-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file mrio-0.0.6.tar.gz.

File metadata

  • Download URL: mrio-0.0.6.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.0.1 keyring/24.3.1 pkginfo/1.9.6 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.32.3 rfc3986/1.5.0 tqdm/4.67.1 urllib3/2.2.1 CPython/3.10.12

File hashes

Hashes for mrio-0.0.6.tar.gz
Algorithm Hash digest
SHA256 933676c085fb8f6ad0ce9813d24bc729f94c36e1d0f0267663b788dc7d8174fa
MD5 342d953c42a692783a9ffb0e4a468971
BLAKE2b-256 9284574186f6e5cf7457299504babbb1a519005e9c27d97616d30cc4e80e0416

See more details on using hashes here.

File details

Details for the file mrio-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: mrio-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.0.1 keyring/24.3.1 pkginfo/1.9.6 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.32.3 rfc3986/1.5.0 tqdm/4.67.1 urllib3/2.2.1 CPython/3.10.12

File hashes

Hashes for mrio-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 797b4fb04746d3ba40d68a3923923c1c6e93622fc1a6ddb00955a49c6e060389
MD5 f0c61e004227737c3946b56c9e3aed48
BLAKE2b-256 02480251dd50f2f6e853c17309c2d12e336263aae9f48db89522432d8ea26439

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page