Skip to main content

Open-source provenance SDK and specification for verifiable EO and climate data workflows

Project description

trazaeo

PyPI crates.io

trazaeo is a Python-first package for adding verifiable provenance to Earth observation and climate data workflows. It gives you fast content hashing, signed provenance envelopes, and workflow helpers you can drop into an existing pipeline without replacing your current scheduler, storage layer, or transform code.

The V1 protocol is now designed as a C2PA-compatible EO provenance profile: C2PA manifests provide the interoperable content-authenticity carrier, while trazaeo keeps BLAKE3 roots, optional Bao range proofs, checkpoint manifests, and proof-log commitments as fast EO-specific verification layers.

Use it when you want to:

  • hash outputs from an existing batch or streaming job
  • attach provenance to a dataset publish step
  • verify that a delivered artifact still matches the published record
  • add provenance checks around netCDF, Zarr, or Icechunk workflows
  • publish C2PA-compatible hard bindings and EO provenance assertions alongside STAC or OGC catalog metadata

Install

For most users:

pip install trazaeo

If you also want the optional netCDF, xarray, and Zarr helpers used by the example workflows:

pip install 'trazaeo[python-examples]'

Published wheels are built as CPython abi3 artifacts from Python 3.12, so a single wheel works across Python 3.12+ on the supported platforms below. If a prebuilt wheel is not available for your platform, pip will fall back to building from source.

Published wheel contract

The package metadata supports Python 3.12+.

The verified published wheel matrix is:

  • CPython abi3 built from Python 3.12
  • Linux manylinux_2_28_x86_64
  • macOS x86_64
  • macOS arm64
  • Windows x86_64
  • import trazaeo
  • import trazaeo_workflows.dataset_provenance
  • from trazaeo import PublicRpcSolanaProofLogAdaptor
  • trazaeo-icechunk --help

Source-build fallback

If you install on a platform outside that wheel matrix, pip will build from source. In Debian/Ubuntu-style environments, install a C/Rust build toolchain and Python development headers for the interpreter you are using:

apt-get update
apt-get install -y build-essential curl pkg-config python3-dev
curl https://sh.rustup.rs -sSf | sh -s -- -y

Then restart the shell and rerun:

pip install trazaeo

Use It Inside Your Existing Pipeline

trazaeo is designed to sit at the boundaries of work you already do.

Typical places to add it:

  • after a transform job writes a file, hash the artifact and store the content root with your job metadata
  • before publishing a dataset, build and sign provenance for the output and its source inputs
  • during delivery or audit, verify that the local artifact still matches the published checkpoint

You do not need to adopt a new pipeline framework. The package works well as:

  • a Python helper inside an Airflow, Prefect, Dagster, or Argo task
  • a provenance step called from an existing batch job or notebook
  • a verification step in a release or data publication workflow

Quick Start

The normal integration point is the Python API. A common first step is to hash an artifact right after your pipeline writes it:

from trazaeo import blake3_content_root


def register_pipeline_output(path: str) -> dict[str, str]:
    content_root = blake3_content_root(path, 4096, 4).hex()
    return {
        "artifact_path": path,
        "content_root_hash": content_root,
    }

Content descriptors also carry a C2PA-compatible sha256 hard binding when the descriptor is built from a local artifact path. The BLAKE3 root remains the trazaeo fast-path commitment; the SHA-256 binding is for C2PA validators.

That works well in an Airflow task, a Prefect flow, a Dagster asset, or a plain Python batch job. You keep your existing transform code and add one provenance step after the file is produced.

For in-memory content:

from trazaeo import blake3_hash, blake3_hash_mt

single = blake3_hash(b"hello world").hex()
parallel = blake3_hash_mt(b"hello world", 4).hex()

Artifact Verification In Process

If your pipeline publishes an artifact and later needs to verify what was delivered, you can build a proof package for the local file:

from trazaeo_workflows import build_local_artifact_full_root_proof_package


def build_local_artifact_proof(path: str) -> dict:
    return build_local_artifact_full_root_proof_package(
        path,
        chunk_size=1 << 20,
        threads=4,
    )

And when you already have a delivery proof package from an upstream publish step, verify it against the artifact path:

from trazaeo_workflows import verify_dataset_delivery_proof_report


def verify_delivery(path: str, delivery_proof_package: dict) -> dict:
    return verify_dataset_delivery_proof_report(
        delivery_proof_package,
        artifact_path=path,
    )

This fits naturally in a downstream validation, QA, or publication check step.

Dataset Publish Workflows

The higher-level trazaeo_workflows helpers are for pipelines that already track their source files, transform job ids, output artifact refs, signer ids, and verification policy. In that case, you pass your existing metadata into trazaeo and let it build the provenance bundle around work your pipeline already performed.

The main Python workflow entrypoints are:

  • trazaeo_workflows.build_dataset_bootstrap_bundle
  • trazaeo_workflows.build_dataset_incremental_bundle
  • trazaeo_workflows.build_dataset_delivery_proof_package
  • trazaeo_workflows.verify_dataset_delivery_proof_report

Those helpers are used by the example netCDF and Icechunk flows in examples/python_netcdf/.

A typical pattern is:

  1. Your pipeline reads or transforms source files.
  2. Your pipeline writes the dataset artifact.
  3. You hash the artifact with trazaeo.
  4. You pass the source metadata, output metadata, signer, and trust policy into a dataset workflow helper.
  5. You store or publish the returned provenance bundle beside the dataset.

For a runnable EO fixture that exercises the compiled Python extension, netCDF input files, Zarr materialization, C2PA hard bindings, signed dataset provenance, and deterministic source metadata:

python -m trazaeo_workflows eo-provenance-demo \
  --output-dir /tmp/trazaeo-eo-demo \
  --signing-key-hex 4f3edf983ac636a65a842ce7c78d9aa706d3b113bce036f9a4f5762b76f70f18

The demo fixes issued_at, source URIs, and observed mtimes so repeated runs can be compared across local directories. Its JSON output includes source content descriptors with sha256 C2PA hard bindings and a publish verification report with signature, trust, lineage, and C2PA interop states.

Documentation

  • Project docs: https://endcorp-hq.github.io/provenance
  • Python workflow examples: examples/python_netcdf/README.md
  • Protocol spec: TRAZAEO_V1_SPEC.md
  • Architecture boundary: docs/contracts/architecture.md
  • Quality gates: docs/contracts/quality-gates.md
  • Rust crate overview: trazaeo/README.md
  • Solana proof-log program: trazaeo/programs/proof-log/README.md

Development

Most users only need pip install trazaeo. If you are contributing to this repository, see CONTRIBUTING.md for local build, test, extension, and docs workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trazaeo-0.5.6.tar.gz (128.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trazaeo-0.5.6-cp312-abi3-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12+Windows x86-64

trazaeo-0.5.6-cp312-abi3-manylinux_2_28_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

trazaeo-0.5.6-cp312-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

trazaeo-0.5.6-cp312-abi3-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file trazaeo-0.5.6.tar.gz.

File metadata

  • Download URL: trazaeo-0.5.6.tar.gz
  • Upload date:
  • Size: 128.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for trazaeo-0.5.6.tar.gz
Algorithm Hash digest
SHA256 d8270c56b762b1b054e4715c041017d0771cbb97987a36910c07af8751c206b2
MD5 ee7adf508627691f7451153d35c65f22
BLAKE2b-256 0607f40ef03b4938cce191ed35e8ad39ae52f7d12893cc30c5700ea1df247d43

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.6-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: trazaeo-0.5.6-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for trazaeo-0.5.6-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6124ca4ddf7aa9f4414c76755fe224b1d421874469389b65556a6dd21f16914f
MD5 2d7d8f539555cf7cbb4499c27394ea92
BLAKE2b-256 303cbbffe8b46de945dbcaed09f7bf1d479d7084a368c678ca15015c863e41cd

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.6-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.6-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a95ea094a648545ec97198f5c785bfb6b6a2768feea60f2e90a2d930217c5060
MD5 2d952a76e41bc765952285000934986c
BLAKE2b-256 fd119aff227c64051ed0f555017eb05abd09ae357d3fcefe303b45e89b47f096

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.6-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.6-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5fb0362bdf009213414db1fbf0b64147b55b0fb67eb28f1c676ec7ba8b4b3b18
MD5 2c2d13d35c776b9707714890b09a16cc
BLAKE2b-256 358c6a0408c2d43e130c268566ffde8b778cc494f68d1a16eec7d34521852755

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.6-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.6-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4a82246fc973a523f47f423d09494f54b12bf1782b680810d994badaf1ed1d54
MD5 ddd0cbd8e9ea23ef9b073988ddc812c7
BLAKE2b-256 893ec788fa5f65b59c46c02e90f5e9f7ef9fdcb0b53a97dd6b93894d93b0a83f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page