Open-source provenance SDK and specification for verifiable EO and climate data workflows
Project description
trazaeo
trazaeo is a Python-first package for adding verifiable provenance to Earth
observation and climate data workflows. It gives you fast content hashing,
signed provenance envelopes, and workflow helpers you can drop into an existing
pipeline without replacing your current scheduler, storage layer, or transform
code.
The V1 protocol is now designed as a C2PA-compatible EO provenance profile: C2PA manifests provide the interoperable content-authenticity carrier, while trazaeo keeps BLAKE3 roots, optional Bao range proofs, checkpoint manifests, and proof-log commitments as fast EO-specific verification layers.
Use it when you want to:
- hash outputs from an existing batch or streaming job
- attach provenance to a dataset publish step
- verify that a delivered artifact still matches the published record
- add provenance checks around netCDF, Zarr, or Icechunk workflows
- publish C2PA-compatible hard bindings and EO provenance assertions alongside STAC or OGC catalog metadata
Install
For most users:
pip install trazaeo
If you also want the optional netCDF, xarray, and Zarr helpers used by the example workflows:
pip install 'trazaeo[python-examples]'
Published wheels are built as CPython abi3 artifacts from Python 3.12, so a
single wheel works across Python 3.12+ on the supported platforms below. If a
prebuilt wheel is not available for your platform, pip will fall back to
building from source.
Published wheel contract
The package metadata supports Python 3.12+.
The verified published wheel matrix is:
- CPython
abi3built from Python 3.12 - Linux
manylinux_2_28_x86_64 - macOS
x86_64 - macOS
arm64 - Windows
x86_64 import trazaeoimport trazaeo_workflows.dataset_provenancefrom trazaeo import PublicRpcSolanaProofLogAdaptortrazaeo-icechunk --help
Source-build fallback
If you install on a platform outside that wheel matrix, pip will build from
source. In Debian/Ubuntu-style environments, install a C/Rust build toolchain
and Python development headers for the interpreter you are using:
apt-get update
apt-get install -y build-essential curl pkg-config python3-dev
curl https://sh.rustup.rs -sSf | sh -s -- -y
Then restart the shell and rerun:
pip install trazaeo
Use It Inside Your Existing Pipeline
trazaeo is designed to sit at the boundaries of work you already do.
Typical places to add it:
- after a transform job writes a file, hash the artifact and store the content root with your job metadata
- before publishing a dataset, build and sign provenance for the output and its source inputs
- during delivery or audit, verify that the local artifact still matches the published checkpoint
You do not need to adopt a new pipeline framework. The package works well as:
- a Python helper inside an Airflow, Prefect, Dagster, or Argo task
- a provenance step called from an existing batch job or notebook
- a verification step in a release or data publication workflow
Quick Start
The normal integration point is the Python API. A common first step is to hash an artifact right after your pipeline writes it:
from trazaeo import blake3_content_root
def register_pipeline_output(path: str) -> dict[str, str]:
content_root = blake3_content_root(path, 4096, 4).hex()
return {
"artifact_path": path,
"content_root_hash": content_root,
}
Content descriptors also carry a C2PA-compatible sha256 hard binding when the
descriptor is built from a local artifact path. The BLAKE3 root remains the
trazaeo fast-path commitment; the SHA-256 binding is for C2PA validators.
That works well in an Airflow task, a Prefect flow, a Dagster asset, or a plain Python batch job. You keep your existing transform code and add one provenance step after the file is produced.
For in-memory content:
from trazaeo import blake3_hash, blake3_hash_mt
single = blake3_hash(b"hello world").hex()
parallel = blake3_hash_mt(b"hello world", 4).hex()
Artifact Verification In Process
If your pipeline publishes an artifact and later needs to verify what was delivered, you can build a proof package for the local file:
from trazaeo_workflows import build_local_artifact_full_root_proof_package
def build_local_artifact_proof(path: str) -> dict:
return build_local_artifact_full_root_proof_package(
path,
chunk_size=1 << 20,
threads=4,
)
And when you already have a delivery proof package from an upstream publish step, verify it against the artifact path:
from trazaeo_workflows import verify_dataset_delivery_proof_report
def verify_delivery(path: str, delivery_proof_package: dict) -> dict:
return verify_dataset_delivery_proof_report(
delivery_proof_package,
artifact_path=path,
)
This fits naturally in a downstream validation, QA, or publication check step.
Dataset Publish Workflows
The higher-level trazaeo_workflows helpers are for pipelines that already
track their source files, transform job ids, output artifact refs, signer ids,
and verification policy. In that case, you pass your existing metadata into
trazaeo and let it build the provenance bundle around work your pipeline
already performed.
The main Python workflow entrypoints are:
trazaeo_workflows.build_dataset_bootstrap_bundletrazaeo_workflows.build_dataset_incremental_bundletrazaeo_workflows.build_dataset_delivery_proof_packagetrazaeo_workflows.verify_dataset_delivery_proof_report
Those helpers are used by the example netCDF and Icechunk flows in
examples/python_netcdf/.
A typical pattern is:
- Your pipeline reads or transforms source files.
- Your pipeline writes the dataset artifact.
- You hash the artifact with
trazaeo. - You pass the source metadata, output metadata, signer, and trust policy into a dataset workflow helper.
- You store or publish the returned provenance bundle beside the dataset.
For a runnable EO fixture that exercises the compiled Python extension, netCDF input files, Zarr materialization, C2PA hard bindings, signed dataset provenance, and deterministic source metadata:
python -m trazaeo_workflows eo-provenance-demo \
--output-dir /tmp/trazaeo-eo-demo \
--signing-key-hex 4f3edf983ac636a65a842ce7c78d9aa706d3b113bce036f9a4f5762b76f70f18
The demo fixes issued_at, source URIs, and observed mtimes so repeated runs
can be compared across local directories. Its JSON output includes source
content descriptors with sha256 C2PA hard bindings and a publish verification
report with signature, trust, lineage, and C2PA interop states.
Documentation
- Project docs: https://endcorp-hq.github.io/provenance
- Python workflow examples:
examples/python_netcdf/README.md - Protocol spec:
TRAZAEO_V1_SPEC.md - Architecture boundary:
docs/contracts/architecture.md - Quality gates:
docs/contracts/quality-gates.md - Rust crate overview:
trazaeo/README.md - Solana proof-log program:
trazaeo/programs/proof-log/README.md
Development
Most users only need pip install trazaeo. If you are contributing to this
repository, see CONTRIBUTING.md for local build, test, extension, and docs
workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trazaeo-0.5.6.tar.gz.
File metadata
- Download URL: trazaeo-0.5.6.tar.gz
- Upload date:
- Size: 128.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8270c56b762b1b054e4715c041017d0771cbb97987a36910c07af8751c206b2
|
|
| MD5 |
ee7adf508627691f7451153d35c65f22
|
|
| BLAKE2b-256 |
0607f40ef03b4938cce191ed35e8ad39ae52f7d12893cc30c5700ea1df247d43
|
File details
Details for the file trazaeo-0.5.6-cp312-abi3-win_amd64.whl.
File metadata
- Download URL: trazaeo-0.5.6-cp312-abi3-win_amd64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6124ca4ddf7aa9f4414c76755fe224b1d421874469389b65556a6dd21f16914f
|
|
| MD5 |
2d7d8f539555cf7cbb4499c27394ea92
|
|
| BLAKE2b-256 |
303cbbffe8b46de945dbcaed09f7bf1d479d7084a368c678ca15015c863e41cd
|
File details
Details for the file trazaeo-0.5.6-cp312-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: trazaeo-0.5.6-cp312-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.12+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a95ea094a648545ec97198f5c785bfb6b6a2768feea60f2e90a2d930217c5060
|
|
| MD5 |
2d952a76e41bc765952285000934986c
|
|
| BLAKE2b-256 |
fd119aff227c64051ed0f555017eb05abd09ae357d3fcefe303b45e89b47f096
|
File details
Details for the file trazaeo-0.5.6-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: trazaeo-0.5.6-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fb0362bdf009213414db1fbf0b64147b55b0fb67eb28f1c676ec7ba8b4b3b18
|
|
| MD5 |
2c2d13d35c776b9707714890b09a16cc
|
|
| BLAKE2b-256 |
358c6a0408c2d43e130c268566ffde8b778cc494f68d1a16eec7d34521852755
|
File details
Details for the file trazaeo-0.5.6-cp312-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: trazaeo-0.5.6-cp312-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a82246fc973a523f47f423d09494f54b12bf1782b680810d994badaf1ed1d54
|
|
| MD5 |
ddd0cbd8e9ea23ef9b073988ddc812c7
|
|
| BLAKE2b-256 |
893ec788fa5f65b59c46c02e90f5e9f7ef9fdcb0b53a97dd6b93894d93b0a83f
|