Deterministic Observability Framework — formal governance, privacy benchmarks, and adversarial testing for multi-agent LLM systems

These details have not been verified by PyPI

Project links

Homepage

Project description

DOF - Deterministic Observability Framework

VERIFY. PROVE. ATTEST.

tests Z3 proofs attestations PyPI license LOC Avalanche

Deterministic Observability Framework (DOF)

Deterministic governance for multi-agent LLM systems. Constitutional rules, formal proofs, and on-chain attestation on Avalanche.

Built with Python 3.11+ · Z3 SMT Solver · web3.py · BLAKE3 · Avalanche C-Chain · PostgreSQL

pip install dof-sdk

from dof import GenericAdapter
result = GenericAdapter().wrap_output("your agent output here")
# → {status: "pass", violations: [], score: 8.5}

30ms. Zero LLM tokens. Works with CrewAI, LangGraph, AutoGen, or anything that produces text.

x402 Trust Gateway (v0.2.6)

First formal verification layer for x402 payments. Zero LLM in the critical path.

from dof import TrustGateway

gateway = TrustGateway()
verdict = gateway.verify(response_body=response)
# verdict.action → ALLOW / WARN / BLOCK
# verdict.governance_score → 0.0–1.0

CLI

python -m dof verify "your text here"   # governance check
python -m dof prove                      # Z3 formal verification
python -m dof health                     # component status
python -m dof benchmark                  # adversarial benchmark
python -m dof privacy                    # privacy benchmark
python -m dof version                    # show version

Key Exports

verify · classify_error · register · run_crew · MerkleBatcher · AdversarialEvaluator · RedTeamAgent · ConstitutionEnforcer · TrustGateway · GatewayVerdict · GatewayAction

The Problem · Highlights · Architecture · Governance Layers · Z3 Verification · On-Chain · Benchmarks · External Validation · Limitations · Citation

The Problem

LLM agents hallucinate. Nobody catches it deterministically. Using LLMs to verify LLMs is circular — the evaluator shares failure modes with the evaluated. Rate limits, cascading retries, and non-deterministic output quality interact across execution steps, producing unstable system-level behavior that cannot be attributed to specific infrastructure variables.

DOF solves this with 7 deterministic governance layers, formal Z3 proofs, and on-chain attestation — zero LLM tokens in the verification path.

Highlights

7 governance layers — Constitution → AST → Supervisor → Z3 → Red/Blue → Memory → Signer
x402 Trust Gateway — formal verification for agent payments (ALLOW/WARN/BLOCK)
SS(f) = 1 − f³ — Z3 verified stability formula under bounded retries
GCR(f) = 1.0 — governance invariant under any failure rate (Z3 proven)
21 on-chain attestations on Avalanche C-Chain mainnet
Merkle batching — 10,000 attestations = 1 tx ≈ $0.01
Automated benchmark — Governance 100%, Hallucination 90%, Consistency 100% FDR, 0% FPR
Privacy benchmark — 71% detection rate across 7 AgentLeak channels
Framework agnostic — CrewAI, LangGraph, AutoGen, or raw Python
A2A server (8 skills) + MCP server (10 tools) + REST API (14 endpoints)
DOFThreatPatterns — 12 threat categories, composite detection (env+POST=exfil, exec+network=revshell, b64+eval=encoded payload), decode_and_scan for encoded evasion
779 tests, 27K+ LOC, 25 core modules, 40 contributions

Architecture

+----------------------------------------------------+
| L7  Signer       HMAC + Avalanche           ~2s    |
+----------------------------------------------------+
| L6  Memory Gov   Bi-temporal + decay        <1ms   |
+----------------------------------------------------+
| L5  Red/Blue     Red -> Guard -> Arb       ~50ms   |
+----------------------------------------------------+
| L4  Z3 Proofs    4 theorems UNSAT          ~10ms   |
+----------------------------------------------------+
| L3  Supervisor   Q+A+C+F scoring            ~5ms   |
+----------------------------------------------------+
| L2  AST Verifier eval/exec/secrets          <1ms   |
+----------------------------------------------------+
| L1  Constitution 4 HARD + 5 SOFT            <1ms   |
+----------------------------------------------------+
| Engine  DAG + LoopGuard + TokenTracker             |
+----------------------------------------------------+
| Data Oracle  6 verification strategies      <1ms   |
+----------------------------------------------------+

Total governance latency: < 70ms (layers 1-6). On-chain signing adds ~2s when enabled.

Seven Governance Layers

Layer	What	Latency
L1 Constitution	4 HARD (block) + 5 SOFT (warn). Regex + keywords	<1ms
L2 AST Verifier	Blocks eval/exec/subprocess/secrets via `ast`	<1ms
L3 Supervisor	S = Q(0.40)+A(0.25)+C(0.20)+F(0.15). ACCEPT ≥ 7.0	~5ms
L4 Z3 Proofs	4 theorems (GCR invariance, SS cubic/mono/bounds)	~10ms
L5 Red/Blue	RedTeam → Guardian → DeterministicArbiter. Zero LLM	~50ms
L6 Memory Gov	Bi-temporal versioning, constitutional decay λ=0.99	<1ms
L7 On-Chain	HMAC-SHA256 + Avalanche. Only GCR=1.0 published	~2s

Formal Verification (Z3)

Theorem	Math	Z3 Result
GCR Invariant	∀f∈[0,1]: GCR(f)=1.0	UNSAT
SS Cubic	∀f∈[0,1]: SS(f)=1−f³	UNSAT
SS Monotonicity	f₁<f₂ ⟹ SS(f₁)>SS(f₂)	UNSAT
SS Boundaries	SS(0)=1.0 ∧ SS(1)=0.0	UNSAT

10ms total. Proof certificates: logs/z3_proofs.json.

On-Chain Attestation

Contract 0x88f6...C052 on Avalanche C-Chain (43114). 21 attestations. ~~$0.01/tx (~~$0.01 per Merkle batch of 10,000). Three layers: PostgreSQL (200ms) → Enigma Scanner (900ms) → Avalanche (2-3s, immutable).

Benchmark Results

Adversarial Benchmark (400 generated tests, deterministic)

Category	FDR	FPR	F1	Tests
Governance	100.0%	0.0%	100.0%	100
Code Safety	86.0%	0.0%	92.5%	100
Hallucination	90.0%	0.0%	94.7%	100
Consistency	100.0%	0.0%	100.0%	100
Overall F1			96.8%	400

Production Results (n=30 runs, real infrastructure)

Metric	Value	Interpretation
SS	0.90 ± 0.31	90% execution stability
GCR	1.00 ± 0.00	Perfect governance invariance
PFI	0.61 ± 0.18	Provider failures recovered via rotation
Supervisor	27/30 ACCEPT	90% acceptance rate

Production Agents

Two DOF-governed agents on Avalanche mainnet, ranked #1 and #2 of 1,772 agents on erc-8004scan.xyz: Apex Arbitrage (#1687, A2A+OASF) and AvaBuilder (#1686, A2A+OASF). Combined trust score: 0.85.

External Validation (Google Colab)

Tested externally via pip install dof-sdk — fresh Colab runtime, zero internal dependencies.

Version	Test	Result
v0.2.6	TrustGateway clean endpoint	ALLOW / score=0.85
v0.2.6	TrustGateway adversarial payload	BLOCK / detected=True
v0.2.6	LLM-as-Judge (score 1-10)	9.0 / PASS
v0.2.6	RedTeam prompt injection	detected=True / PASS
v0.2.6	InstructionHierarchy	compliant=True / PASS
v0.2.2	Z3 Formal Proofs (4/4)	VERIFIED / 19.25ms
v0.2.2	MerkleBatcher	PASSED / 0.31ms

Full reports: tests/external/

Honest Limitations

Hallucination detection is regex-based — 6 deterministic strategies achieve 90% FDR. Misses semantic hallucinations without known-facts coverage.
No correlated failure modeling — SS(f)=1−f³ assumes independent failures.
Supervisor is itself an LLM — mitigated by cross-provider execution and deterministic governance, but circularity is bounded, not eliminated.
Free-tier infrastructure — 3/30 runs fail from provider exhaustion cascades.
Finite sample sizes — n=20-30 per configuration; rare tail events not statistically guaranteed.

Links

Resource	URL
PyPI	pypi.org/project/dof-sdk
GitHub	github.com/Cyberpaisa/deterministic-observability-framework
Snowtrace	snowtrace.io/address/0x88f6...C052
Enigma Scanner	erc-8004scan.xyz
Paper	paper/PAPER_OBSERVABILITY_LAB.md

Citation

@article{cyberpaisa2026deterministic,
  title={Deterministic Observability and Resilience Engineering for
         Multi-Agent LLM Systems: An Experimental Framework
         with Formal Verification},
  author={Cyber Paisa and Enigma Group},
  year={2026},
  note={27K+ LOC, 779 tests, 25 modules, 4 Z3 theorems,
        21 Avalanche attestations, BSL 1.1, pip install dof-sdk}
}

License

This project is licensed under the Business Source License 1.1. Free for non-commercial use, research, and personal projects. Commercial use requires a separate agreement. Contact: @Cyber_paisa on Telegram.

On 2028-03-08 this project converts to Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.8.0

Apr 13, 2026

0.7.0

Apr 12, 2026

0.6.0

Apr 5, 2026

0.5.1

Apr 4, 2026

0.5.0

Apr 2, 2026

0.4.1

Mar 12, 2026

0.3.3

Mar 9, 2026

0.2.8

Mar 9, 2026

This version

0.2.7

Mar 9, 2026

0.2.6

Mar 8, 2026

0.2.5

Mar 8, 2026

0.2.4

Mar 8, 2026

0.2.3

Mar 8, 2026

0.2.2

Mar 8, 2026

0.2.1

Mar 8, 2026

0.2.0

Mar 8, 2026

0.1.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dof_sdk-0.2.7.tar.gz (219.0 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dof_sdk-0.2.7-py3-none-any.whl (160.4 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file dof_sdk-0.2.7.tar.gz.

File metadata

Download URL: dof_sdk-0.2.7.tar.gz
Upload date: Mar 9, 2026
Size: 219.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`6e9d206f70c1de634602072707b7ecc7583521cfbb31f6bc7e62818933e4077b`
MD5	`dc518447643da29bf45424a8e616feb2`
BLAKE2b-256	`e2364dd7e88c9fd34cb3b040fc8dd8c7d6a7582926764fdd35e9957941116449`

See more details on using hashes here.

File details

Details for the file dof_sdk-0.2.7-py3-none-any.whl.

File metadata

Download URL: dof_sdk-0.2.7-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 160.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ffa2c5959de9038828feee31d1eb0f37a8409a927d81cd066ae68e00859123c`
MD5	`be5c8057529e0ddaeef0f3fa40f8f15f`
BLAKE2b-256	`64d97a5cd0f8b0927cea872549dd37c11bef175d62df8de95c542547d8895aa1`

See more details on using hashes here.

dof-sdk 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VERIFY. PROVE. ATTEST.

Deterministic Observability Framework (DOF)

x402 Trust Gateway (v0.2.6)

CLI

Key Exports

Contents

The Problem

Highlights

Architecture

Seven Governance Layers

Formal Verification (Z3)

On-Chain Attestation

Benchmark Results

Adversarial Benchmark (400 generated tests, deterministic)

Production Results (n=30 runs, real infrastructure)

Production Agents

External Validation (Google Colab)

Honest Limitations

Links

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes