Python Package that support the read/write of multidimensional GeoTIFF files
Project description
mrio
mrio is a library for reading and writing multidimensional GeoTIFF files, extending rasterio to support multidimensional arrays.
- GitHub Repository: https://github.com/tacofoundation/mrio
- Documentation: https://tacofoundation.github.io/mrio/
- Specification: https://tacofoundation.github.io/mrio/
- Best Practices: https://tacofoundation.github.io/mrio/
- Examples: https://tacofoundation.github.io/mrio/
What is a Multidimensional GeoTIFF?
A Multidimensional Geo Tag Image File Format (mGeoTIFF) extends the traditional GeoTIFF format by supporting N-dimensional arrays, similar to formats like NetCDF, HDF5, or Zarr. It maintains the simplicity and compatibility of GeoTIFF, offering fast access and the ability to be opened by any GIS software or library that supports the GeoTIFF format.
What is a Temporal GeoTTIFF?
The Temporal GeoTIFF builds upon the mGeoTIFF by implementing a stricter convention for defining the temporal dimension. It requires a four-dimensional structure with dimensions ordered as follows: (time, band, x, y). The temporal dimension adheres to the STAC specification, which includes a start_datetime and a optional end_datetime. The start_datetime and end_datetime are both strings representing according to RFC 3339, section 5.6, and utilizes the Gregorian calendar as the reference system for time. For additional details, please refer to the [Specification][SPECIFICATION.md].
When to use it?
Multidimensional or Temporal GeoTIFF (mGeoTIFF or GeoTIFF)
Perfect for small datacubes (< 1 GB).
NetCDF, HDF5, or Zarr
Best for larger or complex datacubes (> 1 GB).
| Feature | HDF5 | Zarr | mGeoTIFF | Explanation |
|---|---|---|---|---|
| Portability | 2 | 3 | 1 | mGeoTIFF is widely supported and easily shareable. |
| Compression | 2 | 2 | 1 | Advanced compression options thanks to GDAL. |
| Coordinates definition | 3 | 3 | 1 | Strict spatial and temporal definitions. |
| GIS compatibility | 2 | 2 | 1 | Because a mGeoTIFF is still a GeoTIFF. |
| Chunking | 1 | 1 | 3 | mGeoTIFF only supports spatial chunking. |
| Nested Groups | 1 | 1 | 3 | mGeoTIFF does not support hierarchical data organization. |
| Diverse Array Shapes Support | 1 | 1 | 3 | mGeoTIFF only supports regular arrays. |
| Rich Metadata | 1 | 1 | 2 | mGeoTIFF only supports JSON-encoded metadata. |
Proof of Concept
We successfully transformed a massive 4 Terabyte dataset of 200,000 ZARR files into a streamlined, cloud-native 700 GB dataset of just few files. This optimization was achieved using TACO and Temporal GeoTIFFs.
Installation
pip install mrio
How to use it?
Writing a Multidimensional GeoTIFF
import mrio
import numpy as np
import xarray as xr
import pandas as pd
# 1. Define a 5-dimensional dataset (toy example)
data = np.random.rand(3, 5, 7, 128, 128)
time = list(pd.date_range("2021-01-01", periods=5).strftime("%Y%m%d"))
bands = ["B02", "B03", "B04", "B08", "B11", "B12", "B8A"]
simulations = ["historical", "rcp45", "rcp85"]
datacube = xr.DataArray(
data=data,
dims=["simulation", "time", "band", "lat", "lon"],
coords={
"simulation": simulations,
"time": time,
"band": bands,
},
)
# 2. Define the parameters
params = {
"driver": "GTiff",
"dtype": "float32",
"width": datacube.shape[-1],
"height": datacube.shape[-2],
"interleave": "pixel",
"crs": "EPSG:4326",
"transform": mrio.transform.from_bounds(-76.2, 4.31, -76.1, 4.32, 128, 128),
"md:pattern": "simulation time band lat lon -> (simulation band time) lat lon",
"md:coordinates": {
"time": time,
"band": bands,
"simulation": simulations,
},
"md:attributes": { # OPTIONAL: additional attributes to include in the file
"hello": "world",
},
}
# 3. Write the data
with mrio.open("image.tif", mode="w", **params) as src:
src.write(datacube.values)
# 4. Read the data
with mrio.open("image.tif") as src:
data_r = src.read()
# 5. Convert the data back to an xarray DataArray
datacube_r = xr.DataArray(
data=data_r,
dims=src.md_meta["md:dimensions"],
coords=src.md_meta["md:coordinates"],
attrs=src.md_meta["md:attributes"]
)
# 6. Assert that the data is the same
assert np.allclose(datacube, datacube_r)
Writing a Temporal GeoTIFF
A reproducible and real-world example of a Temporal GeoTIFF: using this format reduces file size by 60% compared to storing different time steps in separate files.
from datetime import datetime
import pandas as pd
import tacoreader
import numpy as np
import mrio
# 1. Load a dataset that contains images from the same location
cloudsen12_l1c = tacoreader.load("tacofoundation:cloudsen12-l1c")
# 2. Select a any location
roi_idx = "ROI_00103"
s2_roi_images = cloudsen12_l1c[cloudsen12_l1c["roi_id"] == roi_idx]
s2_roi_images.reset_index(drop=True, inplace=True)
# 3. Load the temporal mini-cube
temporal_stack = np.zeros((len(s2_roi_images), 3, 512, 512), dtype=np.uint16)
for index, row in s2_roi_images.iterrows():
print("Dowloading image %s" % row["tortilla:id"])
# Get the snipet of the Sentinel-2 image
s2_str = row.read(0).read(0)
# Load the image
with mrio.open(s2_str) as src:
s2_image = src.read([4, 3, 2])
# Store the image in the temporal stack
temporal_stack[index] = s2_image
# 4. Set the Coordinates
time_coord = [datetime.fromtimestamp(t).strftime('%Y%m%d') for t in s2_roi_images["stac:time_start"].tolist()]
band_coord = ["B04", "B03", "B02"]
# 5. Set the writing parameters
params = {
'driver': 'GTiff',
'dtype': 'uint16',
'width': temporal_stack.shape[-1],
'height': temporal_stack.shape[-2],
'compress': 'zstd',
'zstd_level': 22,
'predictor': 2,
'tiled': True,
'num_threads': 10,
'interleave': 'pixel',
'crs': row["stac:crs"].iloc[0],
'transform': mrio.Affine.from_gdal(*row["stac:geotransform"].iloc[0]),
'mrio:pattern': 'time band lat lon -> (band time) lat lon',
'mrio:coordinates': {
"time": time_coord,
"band": band_coord
},
'mrio:attributes': {
'time_start': s2_roi_images["stac:time_start"].tolist(),
'time_end': s2_roi_images["stac:time_end"].tolist(),
}
}
# 6. Write the temporal stack
with mrio.open("temporal_stack.tif", mode = "w", **params) as src:
src.write(temporal_stack)
# 7. Read the data
with mrio.open("temporal_stack.tif") as src:
data_r = src.read()
attributes = src.attributes()
temporal_stack = xr.DataArray(
data=data_r,
dims=attributes["mrio:dimensions"],
coords=attributes["mrio:coordinates"],
attrs=attributes["mrio:attributes"]
)
Documentation
See documentation for more details: https://tacofoundation.github.io/mrio/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changes
See the CHANGELOG file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mrio-0.0.6.tar.gz.
File metadata
- Download URL: mrio-0.0.6.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.0.1 keyring/24.3.1 pkginfo/1.9.6 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.32.3 rfc3986/1.5.0 tqdm/4.67.1 urllib3/2.2.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
933676c085fb8f6ad0ce9813d24bc729f94c36e1d0f0267663b788dc7d8174fa
|
|
| MD5 |
342d953c42a692783a9ffb0e4a468971
|
|
| BLAKE2b-256 |
9284574186f6e5cf7457299504babbb1a519005e9c27d97616d30cc4e80e0416
|
File details
Details for the file mrio-0.0.6-py3-none-any.whl.
File metadata
- Download URL: mrio-0.0.6-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.0.1 keyring/24.3.1 pkginfo/1.9.6 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.32.3 rfc3986/1.5.0 tqdm/4.67.1 urllib3/2.2.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
797b4fb04746d3ba40d68a3923923c1c6e93622fc1a6ddb00955a49c6e060389
|
|
| MD5 |
f0c61e004227737c3946b56c9e3aed48
|
|
| BLAKE2b-256 |
02480251dd50f2f6e853c17309c2d12e336263aae9f48db89522432d8ea26439
|