Deterministic Observability Framework — formal governance, privacy benchmarks, and adversarial testing for multi-agent LLM systems

These details have not been verified by PyPI

Project links

Homepage

Project description

DOF - Deterministic Observability Framework

VERIFY. PROVE. ATTEST.

tests Z3 proofs attestations PyPI license LOC Avalanche

Deterministic Observability Framework (DOF)

Deterministic governance for multi-agent LLM systems. Constitutional rules, formal proofs, and on-chain attestation on Avalanche.

Built with Python 3.11+ · Z3 SMT Solver · web3.py · BLAKE3 · Avalanche C-Chain · PostgreSQL

pip install dof-sdk

from dof import GenericAdapter
result = GenericAdapter().wrap_output("your agent output here")
# → {status: "pass", violations: [], score: 8.5}

30ms. Zero LLM tokens. Works with CrewAI, LangGraph, AutoGen, or anything that produces text.

python -m dof verify "your text here"   # governance check
python -m dof prove                      # Z3 formal verification
python -m dof health                     # component status
python -m dof benchmark                  # adversarial benchmark
python -m dof privacy                    # privacy benchmark
python -m dof version                    # show version

The Problem · Highlights · Architecture · Governance Layers · Z3 Verification · On-Chain · Benchmarks · Comparison · Limitations · Citation

The Problem

LLM agents hallucinate. Nobody catches it deterministically. Using LLMs to verify LLMs is circular — the evaluator shares failure modes with the evaluated. Rate limits, cascading retries, and non-deterministic output quality interact across execution steps, producing unstable system-level behavior that cannot be attributed to specific infrastructure variables.

DOF solves this with 7 deterministic governance layers, formal Z3 proofs, and on-chain attestation — zero LLM tokens in the verification path.

Highlights

7 governance layers — Constitution → AST → Supervisor → Z3 → Red/Blue → Memory → Signer
SS(f) = 1 − f³ — Z3 verified stability formula under bounded retries
GCR(f) = 1.0 — governance invariant under any failure rate (Z3 proven)
21 on-chain attestations on Avalanche C-Chain mainnet
Merkle batching — 10,000 attestations = 1 tx ≈ $0.01
Automated benchmark — Governance 100%, Hallucination 90%, Consistency 100% FDR, 0% FPR
Privacy benchmark — 71% detection rate across 7 AgentLeak channels (PII, API keys, memory, tool inputs)
OpenTelemetry ready — optional OTLP tracing (pip install dof-sdk[otel])
EventBus — in-memory pub/sub with circular buffer, Redis/Kafka ready
Framework agnostic — CrewAI, LangGraph, AutoGen, or raw Python
A2A server (8 skills) + MCP server (10 tools) + REST API (14 endpoints)
719 tests, 27K+ LOC, 25 core modules, 36 contributions

Architecture

+----------------------------------------------------+
| L7  Signer       HMAC + Avalanche           ~2s    |
+----------------------------------------------------+
| L6  Memory Gov   Bi-temporal + decay        <1ms   |
+----------------------------------------------------+
| L5  Red/Blue     Red -> Guard -> Arb       ~50ms   |
+----------------------------------------------------+
| L4  Z3 Proofs    4 theorems UNSAT          ~10ms   |
+----------------------------------------------------+
| L3  Supervisor   Q+A+C+F scoring            ~5ms   |
+----------------------------------------------------+
| L2  AST Verifier eval/exec/secrets          <1ms   |
+----------------------------------------------------+
| L1  Constitution 4 HARD + 5 SOFT            <1ms   |
+----------------------------------------------------+
| Engine  DAG + LoopGuard + TokenTracker             |
+----------------------------------------------------+
| Data Oracle  6 verification strategies      <1ms   |
+----------------------------------------------------+

Total governance latency: < 70ms (layers 1-6). On-chain signing adds ~2s when enabled.

Seven Governance Layers

Layer 1 — Constitution. Hard rules block output (hallucination claims without sources, non-English text, empty output, >50K chars). Soft rules score but don't block (missing sources, no structure, repetition, no actionable steps). Pure regex + keyword matching. <1ms.

Layer 2 — AST Verifier. Static analysis of agent-generated code via Python ast module. Blocks eval(), exec(), subprocess, os.system(), __import__(), and hardcoded secrets (OpenAI, GitHub, AWS patterns). <1ms.

Layer 3 — Meta-Supervisor. Weighted quality score: S = Q(0.40) + A(0.25) + C(0.20) + F(0.15). ACCEPT ≥ 7.0, RETRY ≥ 5.0, ESCALATE < 5.0. Cross-provider execution. ~5ms.

Layer 4 — Z3 Formal Proofs. Four machine-checked theorems via Z3 SMT solver. GCR invariance, SS cubic derivation, SS strict monotonicity, SS boundary conditions. All UNSAT (no counterexample exists). ~10ms total.

Layer 5 — Red/Blue Adversarial. RedTeamAgent finds defects, GuardianAgent defends with evidence, DeterministicArbiter adjudicates using only passing tests / governance compliance / AST results. Zero LLM in final adjudication. ACR metric. ~50ms.

Layer 6 — Memory Governance. GovernedMemoryStore validates every write against Constitution. Bi-temporal versioning (valid_from, valid_to, recorded_at). Constitutional decay (λ=0.99/hour) with protected categories (decisions, errors immune to decay). <1ms.

Layer 7 — On-Chain Signer. HMAC-SHA256 signed attestation certificates. Compliance-gated: only GCR=1.0 attestations are published. BLAKE3 certificate hashing. Avalanche C-Chain mainnet via web3.py. ~2s.

Formal Verification (Z3)

Theorem	Math	English	Z3 Result
GCR Invariant	∀f∈[0,1]: GCR(f)=1.0	Governance is independent of failure rate	UNSAT
SS Cubic	∀f∈[0,1]: SS(f)=1−f³	Stability follows cubic decay (r=2 retries)	UNSAT
SS Monotonicity	f₁<f₂ ⟹ SS(f₁)>SS(f₂)	More failures = less stability	UNSAT
SS Boundaries	SS(0)=1.0 ∧ SS(1)=0.0	Perfect at 0% failure, zero at 100%	UNSAT

10ms total. Proof certificates: logs/z3_proofs.json.

On-Chain Attestation

Field	Value
Contract	`0x88f6043B091055Bbd896Fc8D2c6234A47C02C052`
Network	Avalanche C-Chain (43114)
Attestations	21 (March 2026)
Functions	`registerAttestation()`, `registerBatch()`, `isCompliant()`, `getAttestation()`
Cost	~~$0.01 per attestation (~~$0.01 per Merkle batch of 10,000)
Deployer	`0xB529f4f99ab244cfa7a48596Bf165CAc5B317929`

Three verification layers: PostgreSQL (200ms) → Enigma Scanner (900ms) → Avalanche on-chain (2-3s, immutable).

Benchmark Results

Adversarial Benchmark (400 generated tests, deterministic)

Category	FDR	FPR	F1	Tests
Governance	100.0%	0.0%	100.0%	100
Code Safety	86.0%	0.0%	92.5%	100
Hallucination	90.0%	0.0%	94.7%	100
Consistency	100.0%	0.0%	100.0%	100
Overall F1			96.8%	400

Production Results (n=30 runs, real infrastructure)

Metric	Value	Interpretation
SS	0.90 ± 0.31	90% execution stability
GCR	1.00 ± 0.00	Perfect governance invariance
PFI	0.61 ± 0.18	Provider failures recovered via rotation
Supervisor	27/30 ACCEPT	90% acceptance rate

Comparison

Feature	DOF	LangChain	CrewAI	Langfuse
Constitutional governance	7 layers	—	—	—
Z3 formal proofs	4 theorems	—	—	—
AST code safety	Deterministic	—	—	—
On-chain attestation	Avalanche	—	—	—
Adversarial Red/Blue	DeterministicArbiter	—	—	—
Governed memory	Bi-temporal + decay	—	—	—
FDR/FPR benchmark	Automated	—	—	—
Token tracking	Per-call	—	—	Per-call
Execution DAG	Critical path	—	—	Trace tree
Framework agnostic	Any string output	LangChain only	CrewAI only	Any (tracing)
MCP server	10 tools	—	—	—
REST API	14 endpoints	—	—	API
Open source	Apache 2.0	MIT	MIT	MIT/Commercial

Production Agents

Two DOF-governed agents operating on Avalanche mainnet, ranked #1 and #2 of 1,772 agents on erc-8004scan.xyz:

Agent	Token ID	Wallet	Protocols	Status
Apex Arbitrage	#1687	`0xcd59...a983`	A2A + OASF (7 skills)	ACTIVE
AvaBuilder	#1686	`0x29a4...E71a`	A2A + OASF (5 skills)	ACTIVE

Combined trust score: 0.85 (governance 0.35 + safety 0.15 + infrastructure 0.15 + activity 0.15 + community 0.20).

Honest Limitations

Hallucination detection is regex-based — 6 deterministic strategies (pattern matching, cross-reference, consistency, entity extraction, numerical plausibility, self-consistency) achieve 90% FDR on adversarial tests. Misses semantic hallucinations without known-facts coverage.
No correlated or cascading failure modeling — SS(f)=1−f³ assumes independent failures.
Supervisor is itself an LLM — mitigated by cross-provider execution and deterministic governance layer, but circularity is bounded, not eliminated.
Free-tier infrastructure — 3/30 runs fail from provider exhaustion cascades where all 5 providers hit rate limits simultaneously.
Finite sample sizes — n=20-30 per configuration; rare tail events not statistically guaranteed.
No economic cost modeling — token costs tracked but not optimized.

Links

Resource	URL
PyPI	pypi.org/project/dof-sdk
GitHub	github.com/Cyberpaisa/deterministic-observability-framework
Snowtrace	snowtrace.io/address/0x88f6...C052
Enigma Scanner	erc-8004scan.xyz
Paper	paper/PAPER_OBSERVABILITY_LAB.md
Getting Started	docs/GETTING_STARTED.md
Architecture	docs/ARCHITECTURAL_REDESIGN_v1.md

Citation

@article{cyberpaisa2026deterministic,
  title={Deterministic Observability and Resilience Engineering for
         Multi-Agent LLM Systems: An Experimental Framework
         with Formal Verification},
  author={Cyber Paisa and Enigma Group},
  year={2026},
  note={27K+ LOC, 719 tests, 25 modules, 4 Z3 theorems,
        21 Avalanche attestations, Apache 2.0, pip install dof-sdk}
}

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.8.0

Apr 13, 2026

0.7.0

Apr 12, 2026

0.6.0

Apr 5, 2026

0.5.1

Apr 4, 2026

0.5.0

Apr 2, 2026

0.4.1

Mar 12, 2026

0.3.3

Mar 9, 2026

0.2.8

Mar 9, 2026

0.2.7

Mar 9, 2026

0.2.6

Mar 8, 2026

0.2.5

Mar 8, 2026

0.2.4

Mar 8, 2026

0.2.3

Mar 8, 2026

0.2.2

Mar 8, 2026

0.2.1

Mar 8, 2026

This version

0.2.0

Mar 8, 2026

0.1.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dof_sdk-0.2.0.tar.gz (206.0 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dof_sdk-0.2.0-py3-none-any.whl (152.5 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file dof_sdk-0.2.0.tar.gz.

File metadata

Download URL: dof_sdk-0.2.0.tar.gz
Upload date: Mar 8, 2026
Size: 206.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ca98387eafe611428e8841ed9b6b0327d2ee4a4a73a944ff27e8349da833b5a0`
MD5	`1161105b325f3d13e5a2d51a7dec749f`
BLAKE2b-256	`28f33de0056dab46602c1ed883b8d3122234946e8f0721d92d1030f093f35a13`

See more details on using hashes here.

File details

Details for the file dof_sdk-0.2.0-py3-none-any.whl.

File metadata

Download URL: dof_sdk-0.2.0-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 152.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dof_sdk-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eda00b4fd51d478c156ed2b4c4834e0c7c2b96748453558ba296957f9f65be2`
MD5	`dade6c165828474640d0f756ee69df5a`
BLAKE2b-256	`0fc21a6899058a1c0d15f755257587b647c23bee5c9aa0e267ad306c8a7ea4c6`

See more details on using hashes here.

dof-sdk 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VERIFY. PROVE. ATTEST.

Deterministic Observability Framework (DOF)

Contents

The Problem

Highlights

Architecture

Seven Governance Layers

Formal Verification (Z3)

On-Chain Attestation

Benchmark Results

Adversarial Benchmark (400 generated tests, deterministic)

Production Results (n=30 runs, real infrastructure)

Comparison

Production Agents

Honest Limitations

Links

Citation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes