Skip to main content

Open-source provenance SDK and specification for verifiable EO and climate data workflows

Project description

trazaeo

PyPI crates.io

trazaeo is a Python-first package for adding verifiable provenance to Earth observation and climate data workflows. It gives you fast content hashing, signed provenance envelopes, and workflow helpers you can drop into an existing pipeline without replacing your current scheduler, storage layer, or transform code.

The V1 protocol is now designed as a C2PA-compatible EO provenance profile: C2PA manifests provide the interoperable content-authenticity carrier, while trazaeo keeps BLAKE3 roots, optional Bao range proofs, checkpoint manifests, and proof-log commitments as fast EO-specific verification layers.

Use it when you want to:

  • hash outputs from an existing batch or streaming job
  • attach provenance to a dataset publish step
  • verify that a delivered artifact still matches the published record
  • add provenance checks around netCDF, Zarr, or Icechunk workflows
  • publish C2PA-compatible hard bindings and EO provenance assertions alongside STAC or OGC catalog metadata

Install

For most users:

pip install trazaeo

If you also want the optional netCDF, xarray, and Zarr helpers used by the example workflows:

pip install 'trazaeo[python-examples]'

Published wheels are built as CPython abi3 artifacts from Python 3.12, so a single wheel works across Python 3.12+ on the supported platforms below. If a prebuilt wheel is not available for your platform, pip will fall back to building from source.

Published wheel contract

The package metadata supports Python 3.12+.

The verified published wheel matrix is:

  • CPython abi3 built from Python 3.12
  • Linux manylinux_2_28_x86_64
  • macOS x86_64
  • macOS arm64
  • Windows x86_64
  • import trazaeo
  • import trazaeo_workflows.dataset_provenance
  • from trazaeo import PublicRpcSolanaProofLogAdaptor
  • trazaeo-icechunk --help

Source-build fallback

If you install on a platform outside that wheel matrix, pip will build from source. In Debian/Ubuntu-style environments, install a C/Rust build toolchain and Python development headers for the interpreter you are using:

apt-get update
apt-get install -y build-essential curl pkg-config python3-dev
curl https://sh.rustup.rs -sSf | sh -s -- -y

Then restart the shell and rerun:

pip install trazaeo

Use It Inside Your Existing Pipeline

trazaeo is designed to sit at the boundaries of work you already do.

Typical places to add it:

  • after a transform job writes a file, hash the artifact and store the content root with your job metadata
  • before publishing a dataset, build and sign provenance for the output and its source inputs
  • during delivery or audit, verify that the local artifact still matches the published checkpoint

You do not need to adopt a new pipeline framework. The package works well as:

  • a Python helper inside an Airflow, Prefect, Dagster, or Argo task
  • a provenance step called from an existing batch job or notebook
  • a verification step in a release or data publication workflow

Quick Start

The normal integration point is the Python API. A common first step is to hash an artifact right after your pipeline writes it:

from trazaeo import blake3_content_root


def register_pipeline_output(path: str) -> dict[str, str]:
    content_root = blake3_content_root(path, 4096, 4).hex()
    return {
        "artifact_path": path,
        "content_root_hash": content_root,
    }

Content descriptors also carry a C2PA-compatible sha256 hard binding when the descriptor is built from a local artifact path. The BLAKE3 root remains the trazaeo fast-path commitment; the SHA-256 binding is for C2PA validators.

That works well in an Airflow task, a Prefect flow, a Dagster asset, or a plain Python batch job. You keep your existing transform code and add one provenance step after the file is produced.

For in-memory content:

from trazaeo import blake3_hash, blake3_hash_mt

single = blake3_hash(b"hello world").hex()
parallel = blake3_hash_mt(b"hello world", 4).hex()

Artifact Verification In Process

If your pipeline publishes an artifact and later needs to verify what was delivered, you can build a proof package for the local file:

from trazaeo_workflows import build_local_artifact_full_root_proof_package


def build_local_artifact_proof(path: str) -> dict:
    return build_local_artifact_full_root_proof_package(
        path,
        chunk_size=1 << 20,
        threads=4,
    )

And when you already have a delivery proof package from an upstream publish step, verify it against the artifact path:

from trazaeo_workflows import verify_dataset_delivery_proof_report


def verify_delivery(path: str, delivery_proof_package: dict) -> dict:
    return verify_dataset_delivery_proof_report(
        delivery_proof_package,
        artifact_path=path,
    )

This fits naturally in a downstream validation, QA, or publication check step.

Dataset Publish Workflows

The higher-level trazaeo_workflows helpers are for pipelines that already track their source files, transform job ids, output artifact refs, signer ids, and verification policy. In that case, you pass your existing metadata into trazaeo and let it build the provenance bundle around work your pipeline already performed.

The main Python workflow entrypoints are:

  • trazaeo_workflows.build_dataset_bootstrap_bundle
  • trazaeo_workflows.build_dataset_incremental_bundle
  • trazaeo_workflows.build_dataset_delivery_proof_package
  • trazaeo_workflows.verify_dataset_delivery_proof_report

Those helpers are used by the example netCDF and Icechunk flows in examples/python_netcdf/.

A typical pattern is:

  1. Your pipeline reads or transforms source files.
  2. Your pipeline writes the dataset artifact.
  3. You hash the artifact with trazaeo.
  4. You pass the source metadata, output metadata, signer, and trust policy into a dataset workflow helper.
  5. You store or publish the returned provenance bundle beside the dataset.

For a runnable EO fixture that exercises the compiled Python extension, netCDF input files, Zarr materialization, C2PA hard bindings, signed dataset provenance, and deterministic source metadata:

python -m trazaeo_workflows eo-provenance-demo \
  --output-dir /tmp/trazaeo-eo-demo \
  --signing-key-hex 4f3edf983ac636a65a842ce7c78d9aa706d3b113bce036f9a4f5762b76f70f18

The demo fixes issued_at, source URIs, and observed mtimes so repeated runs can be compared across local directories. Its JSON output includes source content descriptors with sha256 C2PA hard bindings and a publish verification report with signature, trust, lineage, and C2PA interop states.

Documentation

  • Project docs: https://endcorp-hq.github.io/provenance
  • Python workflow examples: examples/python_netcdf/README.md
  • Protocol spec: TRAZAEO_V1_SPEC.md
  • Architecture boundary: docs/contracts/architecture.md
  • Quality gates: docs/contracts/quality-gates.md
  • Rust crate overview: trazaeo/README.md
  • Solana proof-log program: trazaeo/programs/proof-log/README.md

Development

Most users only need pip install trazaeo. If you are contributing to this repository, see CONTRIBUTING.md for local build, test, extension, and docs workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trazaeo-0.5.7.tar.gz (136.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trazaeo-0.5.7-cp312-abi3-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12+Windows x86-64

trazaeo-0.5.7-cp312-abi3-manylinux_2_28_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

trazaeo-0.5.7-cp312-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

trazaeo-0.5.7-cp312-abi3-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file trazaeo-0.5.7.tar.gz.

File metadata

  • Download URL: trazaeo-0.5.7.tar.gz
  • Upload date:
  • Size: 136.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trazaeo-0.5.7.tar.gz
Algorithm Hash digest
SHA256 50f246b07719779a30ca8b1ccfd30e18bddc1fd10f777095c8603b6cf2c81c4b
MD5 90ed6016ab76a5409720c8726f1cf697
BLAKE2b-256 b23170c945ca2e1d403673f4988a82e865acfad3b30ba5da9ad4da9fd7830e60

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.7-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: trazaeo-0.5.7-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trazaeo-0.5.7-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 945756ab14a1c883a59e58222f48cd8b59fe80914cc58de839c5a55b046c8225
MD5 11503379b98c8de561ef89f4aa25a776
BLAKE2b-256 3da8861154ef1f46c20384997f0100bbdf0e5f83f84a9f2a6ccd5869e38492c0

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.7-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.7-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 af44bfc90e31d17f9d2aaf11bf14212ce7ab9e1a860ff9ebe69390771c72274b
MD5 63dcebfc84135f5ea8216425a996c4f3
BLAKE2b-256 3e2d571ed55183022ca433471fde12cdc0a020f8d67f1ef9bf3262e7458ac2db

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.7-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.7-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90bce9b51654fe96c0f8d5741060d9a4106bd861bfcf7bd3234cbea30e1041d2
MD5 7a914f387a7a3ecc8331fef2dd1957fe
BLAKE2b-256 db488a60436e3ac05a290567523a481a03322407a4a7199c81428a83ae559f75

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.7-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.7-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ea2cce37638cc6ab0bbb2b574ff1c186e02c1bc8a47d910226451da99b137cff
MD5 5cfd134575e7d0f1a6c3d98dde3ce0ec
BLAKE2b-256 0a9373918a83e3a0d623cabc9272e0a9c02694d270b750318c9ee2a37afb5ed3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page