Skip to main content

Deterministic Observability Framework — formal governance, privacy benchmarks, and adversarial testing for multi-agent LLM systems

Project description

DOF - Deterministic Observability Framework

VERIFY. PROVE. ATTEST.

CI tests Z3 proofs attestations PyPI license LOC Avalanche

Deterministic Observability Framework (DOF)

Deterministic governance for multi-agent LLM systems. Constitutional rules, formal proofs, and on-chain attestation on Avalanche.

Built with Python 3.11+ · Z3 SMT Solver · web3.py · BLAKE3 · Avalanche C-Chain · PostgreSQL

pip install dof-sdk
from dof import GenericAdapter
result = GenericAdapter().wrap_output("your agent output here")
# → {status: "pass", violations: [], score: 8.5}

30ms. Zero LLM tokens. Works with CrewAI, LangGraph, AutoGen, or anything that produces text.

x402 Trust Gateway (v0.2.5)

First formal verification layer for x402 payments. Zero LLM in the critical path.

from dof import TrustGateway

gateway = TrustGateway()
verdict = gateway.verify(response_body=response)
# verdict.action → ALLOW / WARN / BLOCK
# verdict.governance_score → 0.0–1.0

CLI

python -m dof verify "your text here"   # governance check
python -m dof prove                      # Z3 formal verification
python -m dof health                     # component status
python -m dof benchmark                  # adversarial benchmark
python -m dof privacy                    # privacy benchmark
python -m dof version                    # show version

Key Exports

verify · classify_error · register · run_crew · MerkleBatcher · AdversarialEvaluator · RedTeamAgent · ConstitutionEnforcer · TrustGateway · GatewayVerdict · GatewayAction

Contents

The Problem · Highlights · Architecture · Governance Layers · Z3 Verification · On-Chain · Benchmarks · External Validation · Limitations · Citation


The Problem

LLM agents hallucinate. Nobody catches it deterministically. Using LLMs to verify LLMs is circular — the evaluator shares failure modes with the evaluated. Rate limits, cascading retries, and non-deterministic output quality interact across execution steps, producing unstable system-level behavior that cannot be attributed to specific infrastructure variables.

DOF solves this with 7 deterministic governance layers, formal Z3 proofs, and on-chain attestation — zero LLM tokens in the verification path.


Highlights

  • 7 governance layers — Constitution → AST → Supervisor → Z3 → Red/Blue → Memory → Signer
  • x402 Trust Gateway — formal verification for agent payments (ALLOW/WARN/BLOCK)
  • SS(f) = 1 − f³ — Z3 verified stability formula under bounded retries
  • GCR(f) = 1.0 — governance invariant under any failure rate (Z3 proven)
  • 21 on-chain attestations on Avalanche C-Chain mainnet
  • Merkle batching — 10,000 attestations = 1 tx ≈ $0.01
  • Automated benchmark — Governance 100%, Hallucination 90%, Consistency 100% FDR, 0% FPR
  • Privacy benchmark — 71% detection rate across 7 AgentLeak channels
  • Framework agnostic — CrewAI, LangGraph, AutoGen, or raw Python
  • A2A server (8 skills) + MCP server (10 tools) + REST API (14 endpoints)
  • 774 tests, 27K+ LOC, 25 core modules, 40 contributions

Architecture

+----------------------------------------------------+
| L7  Signer       HMAC + Avalanche           ~2s    |
+----------------------------------------------------+
| L6  Memory Gov   Bi-temporal + decay        <1ms   |
+----------------------------------------------------+
| L5  Red/Blue     Red -> Guard -> Arb       ~50ms   |
+----------------------------------------------------+
| L4  Z3 Proofs    4 theorems UNSAT          ~10ms   |
+----------------------------------------------------+
| L3  Supervisor   Q+A+C+F scoring            ~5ms   |
+----------------------------------------------------+
| L2  AST Verifier eval/exec/secrets          <1ms   |
+----------------------------------------------------+
| L1  Constitution 4 HARD + 5 SOFT            <1ms   |
+----------------------------------------------------+
| Engine  DAG + LoopGuard + TokenTracker             |
+----------------------------------------------------+
| Data Oracle  6 verification strategies      <1ms   |
+----------------------------------------------------+

Total governance latency: < 70ms (layers 1-6). On-chain signing adds ~2s when enabled.


Seven Governance Layers

Layer What Latency
L1 Constitution 4 HARD (block) + 5 SOFT (warn). Regex + keywords <1ms
L2 AST Verifier Blocks eval/exec/subprocess/secrets via ast <1ms
L3 Supervisor S = Q(0.40)+A(0.25)+C(0.20)+F(0.15). ACCEPT ≥ 7.0 ~5ms
L4 Z3 Proofs 4 theorems (GCR invariance, SS cubic/mono/bounds) ~10ms
L5 Red/Blue RedTeam → Guardian → DeterministicArbiter. Zero LLM ~50ms
L6 Memory Gov Bi-temporal versioning, constitutional decay λ=0.99 <1ms
L7 On-Chain HMAC-SHA256 + Avalanche. Only GCR=1.0 published ~2s

Formal Verification (Z3)

Theorem Math Z3 Result
GCR Invariant ∀f∈[0,1]: GCR(f)=1.0 UNSAT
SS Cubic ∀f∈[0,1]: SS(f)=1−f³ UNSAT
SS Monotonicity f₁<f₂ ⟹ SS(f₁)>SS(f₂) UNSAT
SS Boundaries SS(0)=1.0 ∧ SS(1)=0.0 UNSAT

10ms total. Proof certificates: logs/z3_proofs.json.


On-Chain Attestation

Contract 0x88f6...C052 on Avalanche C-Chain (43114). 21 attestations. $0.01/tx ($0.01 per Merkle batch of 10,000). Three layers: PostgreSQL (200ms) → Enigma Scanner (900ms) → Avalanche (2-3s, immutable).


Benchmark Results

Adversarial Benchmark (400 generated tests, deterministic)

Category FDR FPR F1 Tests
Governance 100.0% 0.0% 100.0% 100
Code Safety 86.0% 0.0% 92.5% 100
Hallucination 90.0% 0.0% 94.7% 100
Consistency 100.0% 0.0% 100.0% 100
Overall F1 96.8% 400

Production Results (n=30 runs, real infrastructure)

Metric Value Interpretation
SS 0.90 ± 0.31 90% execution stability
GCR 1.00 ± 0.00 Perfect governance invariance
PFI 0.61 ± 0.18 Provider failures recovered via rotation
Supervisor 27/30 ACCEPT 90% acceptance rate

Production Agents

Two DOF-governed agents on Avalanche mainnet, ranked #1 and #2 of 1,772 agents on erc-8004scan.xyz: Apex Arbitrage (#1687, A2A+OASF) and AvaBuilder (#1686, A2A+OASF). Combined trust score: 0.85.


External Validation (Google Colab)

Tested externally via pip install dof-sdk — fresh Colab runtime, zero internal dependencies.

Version Test Result
v0.2.5 TrustGateway clean endpoint ALLOW / score=0.85
v0.2.5 TrustGateway adversarial payload BLOCK / detected=True
v0.2.5 LLM-as-Judge (score 1-10) 9.0 / PASS
v0.2.5 RedTeam prompt injection detected=True / PASS
v0.2.5 InstructionHierarchy compliant=True / PASS
v0.2.2 Z3 Formal Proofs (4/4) VERIFIED / 19.25ms
v0.2.2 MerkleBatcher PASSED / 0.31ms

Full reports: tests/external/


Honest Limitations

  • Hallucination detection is regex-based — 6 deterministic strategies achieve 90% FDR. Misses semantic hallucinations without known-facts coverage.
  • No correlated failure modeling — SS(f)=1−f³ assumes independent failures.
  • Supervisor is itself an LLM — mitigated by cross-provider execution and deterministic governance, but circularity is bounded, not eliminated.
  • Free-tier infrastructure — 3/30 runs fail from provider exhaustion cascades.
  • Finite sample sizes — n=20-30 per configuration; rare tail events not statistically guaranteed.

Links

Resource URL
PyPI pypi.org/project/dof-sdk
GitHub github.com/Cyberpaisa/deterministic-observability-framework
Snowtrace snowtrace.io/address/0x88f6...C052
Enigma Scanner erc-8004scan.xyz
Paper paper/PAPER_OBSERVABILITY_LAB.md

Citation

@article{cyberpaisa2026deterministic,
  title={Deterministic Observability and Resilience Engineering for
         Multi-Agent LLM Systems: An Experimental Framework
         with Formal Verification},
  author={Cyber Paisa and Enigma Group},
  year={2026},
  note={27K+ LOC, 774 tests, 25 modules, 4 Z3 theorems,
        21 Avalanche attestations, BSL 1.1, pip install dof-sdk}
}

License

This project is licensed under the Business Source License 1.1. Free for non-commercial use, research, and personal projects. Commercial use requires a separate agreement. Contact: @Cyber_paisa on Telegram.

On 2028-03-08 this project converts to Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dof_sdk-0.2.6.tar.gz (216.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dof_sdk-0.2.6-py3-none-any.whl (158.0 kB view details)

Uploaded Python 3

File details

Details for the file dof_sdk-0.2.6.tar.gz.

File metadata

  • Download URL: dof_sdk-0.2.6.tar.gz
  • Upload date:
  • Size: 216.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.6.tar.gz
Algorithm Hash digest
SHA256 248aa340ed88444145cd2a463961d36eed8a1809ef7be8547eca4cf5f00e5f9d
MD5 1af7c4f24b9712cc3bba87c19d930cb7
BLAKE2b-256 d23b12446c8cb833581c41220c80c7ebc7baf43b1f57e221052a399c6ea6110a

See more details on using hashes here.

File details

Details for the file dof_sdk-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: dof_sdk-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 158.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 749efefc305bdf729b55e19d4818c9d47a5b1290b6fdbd9d353048a9742f0b86
MD5 4fb8551d8569a5e752cd64b00ad22486
BLAKE2b-256 90cb67d4039d68c668760781386411e6039eac4b6e8b0550e8c671423d1a8006

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page