Skip to main content

Create valid STAC Collections, Items and Assets given already existing raster datasets

Project description

Raster-to-STAC

This component allows the creation of STAC Collections with Items and Assets starting from different kinds of raster datasets. It also enables automatic upload of the resulting files to an Amazon S3 Bucket, making them publicly accessible worldwide. The goal is to make datasets easily accessible, interoperable, and shareable.

Approaches

Depending on your requirements, different output formats are available:

1. Via COGs (Cloud Optimized GeoTIFFs)

This approach reads the input dataset and generates multiple Cloud Optimized GeoTIFFs on local disk. This provides high interoperability with third-party libraries for data reading and visualization.

2. Via ZARR

This approach converts data to ZARR format, which is optimized for cloud storage and efficient chunked access to large datasets. The second approach will write a STAC Collection with a single Zarr object.

3. Via netCDF

This approach keeps data in netCDF format while generating STAC metadata for discovery and access.

Installation

Prerequisites

Make sure to manually specify a PEP 440-compliant version in the pyproject.toml file. The recommended format uses Calendar Versioning (CalVer) like 2025.10.1:

sed -i "s/SEMANTIC_VERSION/$VERSION/g" pyproject.toml

Then install the package:

pip install .

Quick Installation

You can also install directly using pip:

pip install raster2stac

Quickstart

This section provides a quick overview of how to use raster2stac to convert your raster data into STAC-compliant assets.

Get Sample Data

You can download sample data to test the package:

wget https://github.com/euracresearch/raster2stac/raw/main/tests/data/S2_L2A_sample.nc
wget https://github.com/Open-EO/openeo-localprocessing-data/raw/main/sample_netcdf/S2_L2A_sample.nc

Usage Examples

Basic Usage

The main class Raster2STAC provides different methods to generate STAC items and collections for various data formats.

Generate COG STAC from netCDF file

Convert a netCDF file to Cloud Optimized GeoTIFFs (COGs) and generate STAC items:

from raster2stac import Raster2STAC

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",  # The netCDF which will be converted into COGs
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC_COG"
).generate_cog_stac()

Generate STAC from netCDF file (keep netCDF format)

Generate STAC items while keeping the data in netCDF format:

from raster2stac import Raster2STAC

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",  # The netCDF which will be converted into COGs
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC_NETCDF"
).generate_netcdf_stac()

You can then load the STAC item using pystac_client and odc.stac:

import pystac_client
import json
import odc.stac

item_path = "./SENTINEL2_L2A_SAMPLE_STAC/items/20220630000000.json"
stac_api = pystac_client.stac_api_io.StacApiIO()
stac_dict = json.loads(stac_api.read_text(item_path))
item = stac_api.stac_object_from_dict(stac_dict)

ds_stac = odc.stac.load([item])
print(ds_stac)

> <xarray.Dataset> Size: 13MB
> Dimensions:      (y: 705, x: 935, time: 1)
> Coordinates:
>   * y            (y) float64 6kB 5.155e+06 5.155e+06 ... 5.148e+06 5.148e+06
>   * x            (x) float64 7kB 6.75e+05 6.75e+05 ... 6.843e+05 6.843e+05
>     spatial_ref  int32 4B 32632
>   * time         (time) datetime64[ns] 8B 2022-06-30
> Data variables:
>     B04          (time, y, x) float32 3MB 278.0 302.0 274.0 ... 306.0 236.0
>     B03          (time, y, x) float32 3MB 506.0 520.0 456.0 ... 378.0 367.0
>     B02          (time, y, x) float32 3MB 237.0 240.0 249.0 ... 246.0 212.0
>     B08          (time, y, x) float32 3MB 3.128e+03 2.958e+03 ... 1.854e+03
>     SCL          (time, y, x) float32 3MB 4.0 4.0 4.0 4.0 ... 4.0 4.0 4.0 4.0

Generate ZARR STAC from netCDF file

Convert a netCDF file to ZARR format and generate STAC items with additional metadata:

import xarray as xr
from datetime import datetime, timezone
from raster2stac import Raster2STAC
import logging
import os
import numpy as np

rs2stac = Raster2STAC(
    data="S2_L2A_sample.nc",
    collection_id="R2S_TEST_COLLECTION",
    collection_url="https://10.8.244.74:8082/collections/",
    item_prefix="R2S_TEST",
    output_folder="S2_L2A_sample_ZARR",
    description="Test Collection",
    title="Raster2STAC Test Collection",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    stac_version="1.0.0",
    s3_upload=False,
    license="CC-BY-4.0",
    sci_citation="Test citation",
).generate_zarr_stac(item_id="S2_L2A_sample_ZARR")

Case 2: create a Zarr based STAC Collection from a 5-dimensional dataset

  1. Get sample netCDF files:
wget https://github.com/Open-EO/openeo-localprocessing-data/raw/refs/heads/main/sample_netcdf/sample_5D.nc
  1. Call raster2stac:
import xarray as xr
from raster2stac import Raster2STAC
import rioxarray

ds = xr.open_dataset("sample_5D.nc").rio.write_crs(4326,inplace=True)

rs2stac = Raster2STAC(
    data=ds,
    collection_id="DATA_5D",
    collection_url="https://stac.eurac.edu/collections/",
    item_prefix="R2S_TEST",
    output_folder="DATA_5D",
    description="Test Collection with 5 dimensional data",
    title="Raster2STAC Test Collection 5D",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    links= [{
        "rel": "license",
        "href": "https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf",
        "title": "License to use Copernicus Products"
    }],
    stac_version="1.0.0",
    s3_upload=False,
    license="proprietary",
    sci_doi='https://doi.org/10.24381/cds.622a565a',
    sci_citation= "Schimanke S., Ridal M., Le Moigne P., Berggren L., Undén P., Randriamampianina R., Andrea U., \
        Bazile E., Bertelsen A., Brousseau P., Dahlgren P., Edvinsson L., El Said A., Glinton M., Hopsch S., \
        Isaksson L., Mladek R., Olsson E., Verrelle A., Wang Z.Q., (2021): CERRA sub-daily regional reanalysis \
        data for Europe on single levels from 1984 to present. Copernicus Climate Change Service (C3S) Climate \
        Data Store (CDS), DOI: 10.24381/cds.622a565a (Accessed on 15-02-2024)"
).generate_zarr_stac(item_id="DATA_5D")

You can then load the 5D dataset using OpenEO:

from openeo.local import LocalConnection

conn = LocalConnection("")
ds = conn.load_stac("DATA_5D/items/DATA_5D.json").execute()

> <xarray.DataArray (bands: 1, time: 216, level: 2, y: 96, x: 161, number: 25)> Size: 167MB
> dask.array<stack, shape=(1, 216, 2, 96, 161, 25), dtype=int8, chunksize=(1, 54, 1, 24, 81, 13), chunktype=numpy.ndarray>
> Coordinates:
>   * level        (level) int32 8B 500 850
>   * number       (number) int64 200B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23 24
>     spatial_ref  int64 8B ...
>   * time         (time) datetime64[ns] 2kB 2016-01-01 2016-01-02 ... 2016-08-03
>   * x            (x) float64 1kB 5.084 5.151 5.218 5.285 ... 15.69 15.76 15.82
>   * y            (y) float64 768B 43.62 43.69 43.75 43.82 ... 49.86 49.93 50.0
>   * bands        (bands) object 8B 'z'
> Attributes:
>     CDI:          Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
>     CDO:          Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...
>     Conventions:  CF-1.6
>     history:      Tue Feb 27 09:39:09 2024: cdo remapbil,/mnt/CEPH_PROJECTS/I...

Generate STAC from xarray Dataset (netCDF)

Use an existing xarray Dataset to generate STAC items in netCDF format:

import xarray as xr
from raster2stac import Raster2STAC

ds = xr.open_dataset("S2_L2A_sample.nc")

rs2stac = Raster2STAC(
    data=ds,  # The xarray Dataset which will be converted
    collection_id="SENTINEL2_L2A_SAMPLE",  # The Collection id we want to set
    collection_url="https://stac.eurac.edu/collections/",  # The URL where the collection will be exposed
    output_folder="SENTINEL2_L2A_SAMPLE_STAC"
).generate_netcdf_stac()

Generate ZARR STAC from xarray Dataset

Use an existing xarray Dataset to generate STAC items in ZARR format:

import xarray as xr
from datetime import datetime, timezone
from raster2stac import Raster2STAC
import logging
import os
import numpy as np

rs2stac = Raster2STAC(
    data=ds,
    collection_id="R2S_TEST_COLLECTION",
    collection_url="https://10.8.244.74:8082/collections/",
    item_prefix="R2S_TEST",
    output_folder="S2_L2A_sample_ZARR_dataset",
    description="Test Collection",
    title="Raster2STAC Test Collection",
    keywords=["test", "stac", "collection"],
    providers=[
        {
            "url": "https://www.eurac.edu",
            "name": "Eurac Research",
            "roles": ["producer"],
        }
    ],
    stac_version="1.0.0",
    s3_upload=False,
    license="CC-BY-4.0",
    sci_citation="Test citation",
).generate_zarr_stac(item_id="S2_L2A_sample_ZARR")

Key Features

  • Convert netCDF files to COG, netCDF, or ZARR formats
  • Generate STAC-compliant items and collections
  • Support for both file paths and xarray Dataset objects
  • Process multiple netCDF files as a list
  • Flexible metadata configuration
  • Multiple output format options
  • Easy data loading using pystac_client and odc.stac
  • Integration with STAC APIs and clients

Common Workflow

A typical workflow with raster2stac involves:

  1. Convert your raster data to STAC-compliant assets using Raster2STAC
  2. Generate STAC collection and items in your desired format (COG/netCDF/ZARR)
  3. Publish or serve the STAC metadata
  4. Load and use the data through STAC clients using pystac_client and odc.stac
  5. Integrate with STAC catalogs and APIs for discovery and access

License

This project is distributed with MIT license - see 'LICENSE' for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raster2stac-2025.12.1.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raster2stac-2025.12.1-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file raster2stac-2025.12.1.tar.gz.

File metadata

  • Download URL: raster2stac-2025.12.1.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for raster2stac-2025.12.1.tar.gz
Algorithm Hash digest
SHA256 7301b5719b17c496b661e496e8501d74c1b6a423fd4e08cc17afd416a7bc98b5
MD5 b75a6b55d5f46ff382dabc62b6b6153d
BLAKE2b-256 6158ceef0a79186b3125774b47224f6872549021359ecb8895d87a18d0c0661b

See more details on using hashes here.

File details

Details for the file raster2stac-2025.12.1-py3-none-any.whl.

File metadata

File hashes

Hashes for raster2stac-2025.12.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e6d5b8b917e60c52d9f059d5f8f9a06cd2d084fa1dab3948ec70ed940fdc1b24
MD5 8b44c38852676b7b91c327ba32e40d92
BLAKE2b-256 7a0a492784e8c47d15b2c2015e2428f0610183703365aff821e476715533757b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page