Deterministic Observability Framework — formal governance, privacy benchmarks, and adversarial testing for multi-agent LLM systems
Project description
DOF-MESH -- Deterministic Observability Framework
Verify. Prove. Attest. · Mathematics, not promises.
DOF-MESH v0.5.1 | 8 Chains | Enigma Group · Medellín, Colombia
PyPI | Getting Started | Documentation | On-Chain Proof | @Cyber_paisa
The Problem
Your AI agent just made a decision. You have logs. Logs the agent wrote about itself.
That is not proof. That is testimony.
The industry's fix is AI watching AI — until your watchdog hallucinates that everything is fine while an attacker manipulates your agent. We tested 10 frontier models. None could reliably govern themselves. Every one improved 6% to 50% only when a deterministic layer enforced the rules from outside.
Regulators are already asking: "How do you prove your AI made that decision correctly?" Nobody has a standard answer yet.
Our Solution
A deterministic verification layer for autonomous AI agents.
Every decision is translated into formal constraints, verified mathematically in under 30ms, and anchored on-chain as a cryptographic receipt. Not logs. Not monitoring. A proof.
Identity → Task → LLM → Governance → Z3 Proof → On-Chain → Supervisor
Z3 is the theorem prover used in safety-critical systems where failure is not an option. It doesn't estimate compliance. It proves it.
Verified by formal constraints. Auditable by anyone. Immutable once recorded.
DOF doesn't monitor. DOF proves.
What DOF-MESH Is — and Is Not
DOF-MESH is not an agent framework. You don't replace your agents with ours.
It is a verification layer. Your agent runs. DOF intercepts the output, checks it against deterministic rules and Z3 formal proofs, and writes a cryptographic receipt on-chain before the result is accepted. One layer on top of whatever you already have.
LangChain. CrewAI. AutoGen. A custom agent. It doesn't matter. DOF governs it.
The framework is what you install. Everything else in this repo is validation that it works.
Quick Install
pip install dof-sdk
from dof import DOFVerifier
verifier = DOFVerifier()
result = verifier.verify_action(
agent_id="apex-1687",
action="transfer",
params={"amount": 500, "token": "USDC"}
)
# → verdict: "APPROVED"
# → z3_proof: "4/4 VERIFIED [GCR_INVARIANT:VERIFIED | SS_FORMULA:VERIFIED | ...]"
# → latency_ms: 8.2
# → attestation: "0x44b45cd026916c..."
Or run the full mesh:
git clone https://github.com/Cyberpaisa/DOF-MESH.git && cd DOF-MESH
cp .env.example .env
docker-compose up -d --build
Architecture
+=====================================================================+
| DOF-MESH v0.5.0 |
| |
| +---------------------------------------------------------------+ |
| | INTERFACE LAYER | |
| | CLI | A2A Server | Telegram | Streamlit Dashboard | Voice | |
| +---------------------------------------------------------------+ |
| | |
| +---------------------------------------------------------------+ |
| | EXPERIMENT LAYER | |
| | ExperimentDataset | BatchRunner | Schema | Parametric Sweep | |
| +---------------------------------------------------------------+ |
| | |
| +---------------------------------------------------------------+ |
| | OBSERVABILITY LAYER | |
| | RunTrace | StepTrace | 5 Derived Metrics | JSONL Audit | |
| +---------------------------------------------------------------+ |
| | |
| +-------------------------------+-------------------------------+ |
| | GOVERNANCE CORE | VERIFICATION CORE | |
| | | | |
| | ConstitutionEnforcer | Z3Verifier (4/4 PROVEN) | |
| | HARD rules: block | Formal invariant proofs | |
| | SOFT rules: warn | keccak256 proof hashes | |
| | Zero LLM in governance | ASTVerifier + Z3Gate | |
| +-------------------------------+-------------------------------+ |
| | |
| +---------------------------------------------------------------+ |
| | CORE INFRASTRUCTURE | |
| | | |
| | crew_runner.py Orchestration, retry x3, crew_factory | |
| | providers.py TTL backoff (5/10/20m), provider chains | |
| | supervisor.py MetaSupervisor weighted scoring | |
| | memory_manager.py ChromaDB + HuggingFace embeddings | |
| | autonomous_daemon.py Perceive, Decide, Execute, Evaluate | |
| | node_mesh.py NodeRegistry + MessageBus + MeshDaemon | |
| +---------------------------------------------------------------+ |
| | |
| +-------------------------------+-------------------------------+ |
| | AGENT RUNTIME LAYER | ON-CHAIN LAYER | |
| | (any framework) | | |
| | | DOFProofRegistry | |
| | LangChain · CrewAI | Avalanche · Base · Celo | |
| | AutoGen · Custom | Polygon · SKALE · Conflux | |
| | | 8 chains · 30+ attestations | |
| +-------------------------------+-------------------------------+ |
+=====================================================================+
Core Components
| Component | What It Does |
|---|---|
| ConstitutionEnforcer | Deterministic governance -- HARD rules block, SOFT rules warn. Zero LLM involvement. ~50 token constitution injected per agent. |
| Z3Verifier | 4 mathematical theorems formally PROVEN every cycle. Generates keccak256 proof hashes for on-chain recording. |
| Z3Gate | Neurosymbolic gate -- LLM proposes, Z3 verifies. APPROVED / REJECTED / TIMEOUT / FALLBACK. |
| MetaSupervisor | Weighted quality scoring: Q(0.40) + A(0.25) + C(0.20) + F(0.15). Outputs ACCEPT, RETRY, or ESCALATE. |
| DOFProofRegistry | Multi-chain attestation engine. Writes proof receipts to 8 chains. Verifiable by any third party. |
| MeshDaemon | 29 nodes + threshold consensus. Byzantine guard, CRDT memory, constitution hash beacon. |
| ProviderManager | LiteLLM router across 7+ LLMs. TTL backoff, automatic failover, Thompson Sampling. |
The Numbers
We wrote 4,157 tests before calling it a framework. Most projects write them after something breaks.
The proofs aren't in our database. They're on 8 public blockchains. Go check — nobody has to take our word for it.
We tested 10 of the best AI models in the world. None could govern themselves. All improved 6% to 50% the moment a deterministic layer enforced the rules.
The same verification engine used to certify commercial aircraft and nuclear reactors now certifies your agents. The one technology where "probably correct" was never an option.
No model decides if your agent broke a rule. A deterministic function does. Every time. Same answer.
| Metric | Value |
|---|---|
| Unit tests | 4,191 |
| Autonomous cycles | 238+ |
| On-chain attestations | 30+ |
| Chains (mainnets) | 7 |
| Core modules | 142 |
| Lines of code | 57,000+ |
| Z3 theorems | 4/4 PROVEN |
| Hierarchy patterns (Z3) | 42 PROVEN |
| LLM providers | 7+ (Cerebras, Groq, DeepSeek, Gemini, NVIDIA, SambaNova, Zhipu) |
| Governance mode | 100% deterministic, 0% LLM |
Pre-Execution Governance — The Architectural Leap
Most AI governance frameworks check what happened. DOF checks what's about to happen.
Every tool call passes through a 3-layer pipeline before execution is permitted:
Agent proposes action
│
▼
┌─────────────────────┐
│ Layer 1 │ ConstitutionEnforcer
│ HARD BLOCK │ Blocked tools rejected instantly.
│ │ No LLM. No tokens. No exceptions.
└─────────┬───────────┘
│ PASS
▼
┌─────────────────────┐
│ Layer 2 │ Z3Gate (neurosymbolic)
│ FORMAL VERIFY │ 4 invariants proven mathematically.
│ │ keccak256 proof hash generated.
└─────────┬───────────┘
│ APPROVED
▼
┌─────────────────────┐
│ Layer 3 │ PostToolUse
│ ATTEST │ Attestation written to JSONL + chain.
│ │ Immutable. Auditable. Verifiable.
└─────────────────────┘
from core.tool_hooks import ToolHookPipeline, GovernanceViolation
hook = ToolHookPipeline()
# Before any tool executes:
pre = hook.pre_tool_use("transfer", "amount=500 token=USDC", "apex-1687")
if not pre.allowed:
raise GovernanceViolation(pre.reason)
# After execution:
post = hook.post_tool_use("transfer", result, "apex-1687", pre_result=pre)
print(post.attestation_hash) # 0x44b45cd026916c...
34 tests. 0 failures. Every tool call governed before it runs.
Tech Stack
| Layer | Technology |
|---|---|
| Core Framework | Python 3.11+ |
| Formal Verification | Z3 Theorem Prover -- 4/4 invariants PROVEN |
| Blockchain | web3.py · Avalanche · Base · Celo · Polygon · SKALE · Conflux |
| LLM Routing | LiteLLM Router (7+ providers, TTL backoff, Thompson Sampling) |
| SDK | dof-sdk on PyPI -- pip install dof-sdk |
| Vector Memory | ChromaDB + HuggingFace embeddings (all-MiniLM-L6-v2) |
| Persistence | JSONL audit logs -- zero external telemetry dependencies |
| Protocols | A2A · MCP · ERC-8004 · ERC-8183 · x402 |
| Container | Docker Citadel (OrbStack) -- read-only agent sandbox |
On-Chain Attestation
DOFProofRegistry deployed on 8 chains (3 mainnet + 5 testnet). Verified on-chain 03 Apr 2026.
| Chain | Contract | Status |
|---|---|---|
| Avalanche C-Chain (43114) | 0x154a3F49a9d28FeCC1f6Db7573303F4D809A26F6 |
✅ mainnet |
| Base Mainnet (8453) | 0x4e54634d0E12f2Fa585B6523fB21C7d8AaFC881D |
✅ mainnet |
| Celo Mainnet (42220) | 0x35B320A06DaBe2D83B8D39D242F10c6455cd809E |
✅ mainnet |
| Avalanche Fuji (43113) | 0x0b65d10FEcE517c3B6c6339CdE30fF4A8363751c |
✅ testnet |
| Base Sepolia (84532) | 0x7e0f0D0bC09D14Fa6C1F79ab7C0EF05b5e4F1f59 |
✅ testnet |
| Conflux Testnet (71) | 0x554cCa8ceBE30dF95CeeFfFBB9ede5bA7C7A9B83 |
✅ testnet |
| Polygon Amoy (80002) | 0x0b65d10FEcE517c3B6c6339CdE30fF4A8363751c |
✅ testnet |
| SKALE Base Sepolia (324705682) | 0x4e54634d0E12f2Fa585B6523fB21C7d8AaFC881D |
✅ testnet · zero gas |
Roadmap: Polygon mainnet · Conflux eSpace · SKALE Base mainnet
ERC-8004 Agent: #1687 (Apex) · #1686 (AvaBuilder)
ERC-8183: DOFEvaluator.sol → complete() / reject()
Proof hash: keccak256(Z3 proof transcript) -- verifiable via verifyProof()
Gas cost: $0.01/tx · SKALE chains: zero gas · Merkle batch: 10K attestations = 1 tx
How The Pipeline Works
1. IDENTITY Agent authenticates via ERC-8004 identity
|
2. TASK Discovery loop finds next task (or via A2A / Telegram)
|
3. LLM INFERENCE LiteLLM routes to best available provider
Fallback chain: Cerebras → Groq → DeepSeek → Gemini → NVIDIA
|
4. GOVERNANCE ConstitutionEnforcer evaluates output
HARD rules: block on violation | SOFT rules: warn and log
ZERO LLM in this step -- purely deterministic
|
5. Z3 PROOF Z3Verifier generates formal mathematical proof
4 invariants checked, proof hash = keccak256(proof)
|
6. ON-CHAIN DOFProofRegistry writes attestation to 8 chains
ERC-8004 receipt with agent ID + proof hash
|
7. SUPERVISOR MetaSupervisor scores: Q(0.40)+A(0.25)+C(0.20)+F(0.15)
Decision: ACCEPT → next cycle | RETRY → re-execute | ESCALATE → human
Winston Experiment — Multi-Model Validation
10 frontier models evaluated with and without the Winston framework. Deterministic scorer, 0 LLMs. Externally validated on Adaline (200M+ API calls/day).
Model BLUE(Winston) RED(baseline) Delta
────────────────────────────────────────────────────────
DeepSeek-V3 88.7 38.7 +50.0
GLM-4.5 90.0 42.7 +47.3
Mistral-Large 78.7 41.3 +37.4
Claude Sonnet 90.0 56.0 +34.0
ChatGPT-4o 88.7 63.0 +25.7
Gemini-2.5Pro 84.7 71.3 +13.4
────────────────────────────────────────────────────────
Average +26.1
Full data: experiments/winston_vs_baseline/
Documentation
| Document | Description |
|---|---|
| Getting Started | Installation, first run, environment setup |
| System Architecture | Full governance pipeline with latency per layer |
| Case Study — Apex #1687 | 238 autonomous cycles · 0 incidents · 30+ attestations |
| Winston Experiment | Raw data: +26.1% average across 10 frontier models |
| Multichain Deployment | Deploy DOFProofRegistry to 8 chains |
| SKALE Integration | Zero-gas chains · x402 · BITE · IMA bridge |
| Attestations | On-chain proof records — publicly verifiable |
| Full Index | All public documentation |
Repository Structure
equipo-de-agentes/
core/ # 142 modules -- the framework engine
governance.py # ConstitutionEnforcer, HARD/SOFT rules
z3_verifier.py # 4 theorems formally PROVEN
z3_gate.py # Neurosymbolic gate (APPROVED/REJECTED/TIMEOUT)
supervisor.py # MetaSupervisor weighted scoring
providers.py # LiteLLM router, TTL backoff
autonomous_daemon.py # 4 phases: Perceive→Decide→Execute→Evaluate
node_mesh.py # NodeRegistry + MessageBus + MeshDaemon
claude_commander.py # 5 modes: SDK, Spawn, Team, Debate, Peers
...
agents/ # agents running on DOF (validation, not the product)
contracts/ # DOFProofRegistry.sol (deployed on 8 chains — 3 mainnet + 5 testnet)
integrations/ # CrewAI, AgentKit, Virtuals, SKALE, Tempo
tests/ # 4,157 unit tests
docs/ # public documentation
logs/ # JSONL audit trails (append-only)
experiments/ # Winston vs baseline raw data
Built By
@Cyber_paisa · Telegram
DOF-MESH -- The production laboratory where deterministic AI governance is built. 4,157 tests. 8 chains (3 mainnet + 5 testnet). 142 modules. Mathematics, not promises.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dof_sdk-0.5.1.tar.gz.
File metadata
- Download URL: dof_sdk-0.5.1.tar.gz
- Upload date:
- Size: 894.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c1e64885f7a5598c64fd807a1785504d2cda732fbf15dcc3d1e81eb3833c99
|
|
| MD5 |
050cb78286901aee67bd053cf2b7469a
|
|
| BLAKE2b-256 |
ccd33d402c4d0ca4d216575f118f1ff6e59834c19aaca51c64caef245c0895a3
|
File details
Details for the file dof_sdk-0.5.1-py3-none-any.whl.
File metadata
- Download URL: dof_sdk-0.5.1-py3-none-any.whl
- Upload date:
- Size: 658.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a763751db2a60229756feaba67de479352521c979b0df69151eeed9d05de1423
|
|
| MD5 |
5a4ecc91a16f9df35e7cfb38f2ab40a4
|
|
| BLAKE2b-256 |
fec8fa7572239bc93331faec89cdbed39ab0449acaab33927b841d30af2d2fbd
|