Skip to main content

Open-source provenance SDK and specification for verifiable EO and climate data workflows

Project description

trazaeo V1

PyPI crates.io

This repository contains the trazaeo Rust crate and Python bindings for verifiable provenance in Earth observation and climate data workflows. The project includes hashing, provenance records, proof logging adaptors, and examples for NC to Zarr or Icechunk verification flows.

The V1 protocol covers three primary use cases:

  • source-device capture, where a sensor or edge device signs captured bytes
  • transport receipt, where a ground station or relay attests to received bytes or helper processing
  • dataset transforms and publication, where one or more inputs are turned into derived artifacts and checkpointed for audit

V1 envelope schemas live in trazaeo/schemas/.

The current release is V1: a stable core verification model with optional adaptor-backed assurance for storage binding and proof logging.

Repository contracts:

  • Compatibility matrix: docs/contracts/compatibility.md
  • Quality gates: docs/contracts/quality-gates.md
  • Python example boundaries: docs/contracts/architecture.md
  • Merkle/Bao replacement proposal: docs/proposals/merkle-bao-replacement.md
  • Roadmap: ROADMAP.md
  • Documentation site source: website/ (Vocs)

Building

You can build the crate with the standard Rust toolchain. From the repository root run:

cargo build --release

This will produce the trazaeo library in target/release.

To run the unit tests execute:

cargo test

For full local quality gates (lint + type checks + tests), run from repo root:

make ci

To run coverage locally (Rust LCOV + Python coverage XML), run:

make coverage

To run Rust fuzz targets locally, install cargo-fuzz and run a target from trazaeo/fuzz/:

cargo install cargo-fuzz
cargo fuzz run decode_range_proof_package --manifest-path trazaeo/fuzz/Cargo.toml

To install local commit-time checks (pre-commit parity with CI):

make precommit-install

The exact gate policy is documented in docs/contracts/quality-gates.md.

To run the streaming BLAKE3 performance harness:

cargo run --example perf_hashing -- <path-to-file> [chunk_size_bytes] [threads]

Reliability examples (source, transport, and transform to reward readiness)

These examples support reliability validation for the V1 flow described in TRAZAEO_V1_SPEC.md sections 15, 8, and 12.

Rust retry + idempotency demo:

cargo run --example reliability_demo

Python file-root reliability check (after building Python bindings):

python -m trazaeo_workflows reliability-check <path-to-file> --chunk 1048576 --threads 4

Python netCDF content-root check:

python -m trazaeo_workflows hash-netcdf <path-to-file> --chunk 4096 --threads 4

Python source-device capture demo:

python -m trazaeo_workflows capture-source \
  --subject-id capture-source-1 \
  --capture-actor-id sensor-1 \
  --capture-system-id sensor-pipeline-1 \
  --output-ref obj://raw/1 \
  --segment-id frame-1 \
  --payload-text telemetry

Python transport-receipt capture demo:

python -m trazaeo_workflows capture-transport \
  --subject-id capture-transport-1 \
  --capture-actor-id ground-station-1 \
  --capture-system-id rx-1 \
  --input-ref uplink://pass-1 \
  --output-ref obj://relay/1 \
  --segment-id seg-transport-1 \
  --payload-text downlink-frame

Python publish+verify envelope demo:

TRUST_POLICY_JSON='{"allowed_keys":["18e6a97db14c236f52bb13ee7c843ee077ae77c43a37d2f8c548abd79036e599"],"revoked_keys":[],"audit_log":[{"action":"allow","key_id":"18e6a97db14c236f52bb13ee7c843ee077ae77c43a37d2f8c548abd79036e599","reason":"local demo trust policy","effective_at":"2026-01-01T00:00:00Z"}]}'
python -m trazaeo_workflows publish-demo --mode sampled --trust-policy-json "$TRUST_POLICY_JSON"

publish-demo prints one JSON object with publish_input, publish_envelope, and verification_report.

Python adaptor demo with S3-style storage + public-RPC Solana proof log:

python -m trazaeo_workflows publish-solana --mode sampled --trust-policy-json "$TRUST_POLICY_JSON"

By default the demo uses https://api.devnet.solana.com with an ephemeral devnet signer for local testing. For solana-mainnet, pass a funded Solana keypair file:

python -m trazaeo_workflows publish-solana \
  --cluster solana-mainnet \
  --rpc-url https://api.mainnet-beta.solana.com \
  --proof-log-keypair-path ~/.config/solana/id.json \
  --trust-policy-json "$TRUST_POLICY_JSON"

The memo-backed public-RPC proof-log adaptor verifies the committed transaction and signer, but it does not expose a chain root, so the CLI reports chain_root: null for that adaptor.

Python NC collection to Zarr/Icechunk conversion + verification demo:

python -m trazaeo_workflows icechunk \
  path/to/a.nc path/to/b.nc \
  --zarr-store outputs/sst.zarr \
  --dataset-id sst \
  --dataset-version v1 \
  --trust-policy-json "$TRUST_POLICY_JSON"

Jupyter notebook walkthrough for pre/post conversion visualization and verification:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[notebooks]'
jupyter lab examples/python_netcdf/notebooks/nc_to_zarr_provenance_walkthrough.ipynb

This notebook install is self-contained for the walkthrough: it includes the example runtime dependencies, dask[array], and ipykernel.

Documentation site

Install docs dependencies and run local dev mode:

cd website
npm install
npm run dev

Build the static docs site:

cd website
npm run build

Python bindings

Python bindings are provided via PyO3. The easiest way to build them is with maturin:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install maturin
cd trazaeo
maturin develop --release --features python-extension,python-proof-log-rpc

After building you can import the trazaeo module from Python.

Python dependencies for examples/tests

Install optional Python dependencies for netCDF examples and test tooling:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[python-examples,test]'

Example

Below is a minimal Rust example that hashes a file into a content descriptor:

use trazaeo::hashing::hash_file_content_descriptor;

fn main() {
    let descriptor = hash_file_content_descriptor(
        "data.bin",
        "artifact-1",
        1024,
        4,
        "application/octet-stream",
        "2026-01-01T00:00:00Z",
    )
    .expect("content descriptor");

    println!("content root: {}", descriptor.content_root_hash);
}

In Python you can call the provided hash helper after installing the editable package in your virtual environment. Both single threaded and multithreaded variants are exposed:

>>> from trazaeo import blake3_hash
>>> blake3_hash(b"hello world")
>>> from trazaeo import blake3_hash_mt
>>> blake3_hash_mt(b"hello world", 4)

Optional Bao range proofs

trazaeo can generate Bao outboard data and byte-range proof packages internally, so downstream apps do not need to bolt this on themselves.

This is an integrity feature, not a secrecy feature. In the current V1 model, Bao verifies byte ranges against the BLAKE3 file hash recorded in the content descriptor.

Bao support is optional and gated behind the Rust feature bao-range-proofs. Default builds do not expose the Bao helpers.

>>> from trazaeo import bao_outboard_json, bao_range_proof_package_json
>>> outboard_json = bao_outboard_json("example.nc", 4096, 4, None)
>>> proof_json = bao_range_proof_package_json("example.nc", 0, 4096, 4096, 4)

Hashing a netCDF file with zero copy

The crate provides a helper to hash a file directly into a content descriptor using memory mapping. From Python you can compute the content root of a netCDF file as follows:

>>> from trazaeo import blake3_content_root
>>> root = blake3_content_root("example.nc", 4096, 4)
>>> print(root.hex())

blake3_content_root reads the input using a zero-copy memory map to minimize RAM usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trazaeo-0.5.3.tar.gz (108.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trazaeo-0.5.3-cp314-cp314-macosx_11_0_arm64.whl (840.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

File details

Details for the file trazaeo-0.5.3.tar.gz.

File metadata

  • Download URL: trazaeo-0.5.3.tar.gz
  • Upload date:
  • Size: 108.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for trazaeo-0.5.3.tar.gz
Algorithm Hash digest
SHA256 2fb0e1a8a34b59f973bda7ed7d2808bd7cc11a8d887b7e5bec13c5b2d7118a71
MD5 4cb2bb93e8f09ed5ce3f7b7ff06f24f7
BLAKE2b-256 4c89ae93844797174dfc076e3746322f69c0eb438a3be3fab06f8e202158dd5c

See more details on using hashes here.

File details

Details for the file trazaeo-0.5.3-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trazaeo-0.5.3-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6a7a205b17ab999a9f511e9c0584c496ca509dd94223d998f6ca7bd400a67dd3
MD5 61a27e12e62a6b1ca02f85f2f6093f11
BLAKE2b-256 d08f1f00337ba6480b1957e825fdb36f28ac142891b25775036b4df006b42687

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page