Autonomous AI build system. Describe what you want. Belief builds it, tests it, deploys it, and learns — on your laptop, no API key required.

These details have not been verified by PyPI

Project links

Project description

Belief Engine

An autonomous AI system that turns a sentence into working, tested software — and improves itself after every build.

pip install belief-engine

belief --goal "Build a bookmark manager API with FastAPI — CRUD with tags, GET /random. SQLite." \
  --deploy docker_local

Benchmark: 85% Pass Rate

Tested on 20 challenges spanning single-file scripts to workflow DAG engines.

Pass rate:     17/20 (85%)
Avg weighted:  0.86
Cost per build: $0.18
Build time:    ~5 minutes

Tier 1 (scripts):        2/3
Tier 2 (CLIs + APIs):    4/4
Tier 3 (CRUD apps):      4/5
Tier 4 (multi-component): 3/4
Tier 5 (complex systems): 4/4

The engine builds complex systems (workflow engines, inventory managers, quiz platforms) more reliably than simple scripts. Tier 5 has been at 100% for three consecutive benchmark runs.

Validation: Does accumulated knowledge help a local model?

Research question. The engine stores patterns, antipatterns, covenants, and skeletons in ChromaDB soil after every build. Does that accumulated knowledge cause a measurable quality lift when the engine is paired with a local model — or is the lift just noise from running more computation against the same weights?

Protocol. Four paired A/B runs over 2026-04-22. Same model (qwen2.5-coder:14b, Q4_K_M), same hardware (MacBook Air M2 16GB), same challenge set (five tier-1/tier-2 problems rotating between runs). The only variable between the two arms: whether the engine's ChromaDB soil, covenants, and debug memory are connected to the model on inference.

Results.

Run (timestamp)	Engine + local	Raw local	Δ
02:46	5 / 5	2 / 5	+60%
07:03	5 / 5	2 / 5	+60%
08:03	5 / 5	3 / 5	+40%
08:52	5 / 5	4 / 5	+20%
Cumulative n=20	20 / 20	11 / 20	+45%

Fisher's exact test on the paired n=20 gives p < 0.001.

A fifth run the next morning on a fresh three-challenge sample reproduced the pattern: engine 3/3 vs raw 1/3, +66.7% lift. By the end of the experiment window the archive held 424 builds, 37 covenants, and had extracted ~100 new nutrients in the previous 24 hours.

What this means. For this local model, on this paired benchmark, a ChromaDB-backed context layer with FSRS-decayed nutrients and AST-enforced covenants produces a statistically significant quality lift. The local-14B pipeline solved problems it could not solve without the engine's accumulated knowledge.

Honest limitations.

n=20 is below publication-grade for a strong claim across all 20 benchmark challenges; the next milestone is n=50 paired with per-domain analysis.
Challenges rotate, so the raw-local scores drift between runs (easier challenges rotate in as the engine's coverage grows).
Engine wall clock is 10-15× slower per build (~255-900s vs ~30-70s raw). Quality/time tradeoff, not a free lunch.
Factorial ablation (soil × covenants × debug × skeleton) is needed to attribute the lift — which subsystem is load-bearing is still an open question.

Reproducibility. Raw data: ~/.belief-engine/experiments.db. Methodology and statistical protocol: docs/validation/v3.1.0-consistency-results.md.

How It Works

You: "Build a todo app with Click"
  |
11 AI agents collaborate in a convergence loop:
  intake -> research -> planner -> architect -> skeleton -> builder
  -> covenant enforce -> import fix -> tester -> executor -> debugger
  -> synthesizer -> validator (real pytest) -> water cycle -> deploy
  |
Working software, tested, Dockerized, deployed.

The engine doesn't just generate code — it builds, tests, debugs, deploys, and learns. Every build deposits knowledge into ChromaDB soil. Patterns, antipatterns, and covenants feed future builds. Build 50 is smarter than build 1.

v3.0: Autocatalytic Self-Improvement

v3.0 adds a full self-improvement loop. The engine builds tools for itself, discovers its own rules, and measures its own progress.

           Jitterbug Cycle
          /               \
    Expansion          Integration
   (diverse builds)    (accept/prune)
        |                   |
    Compression        Validation
   (cluster failures)  (regression check)
        |
   Reconstruction
   (build tools, crystallize covenants)

5 new subsystems:

Subsystem	What it does
FSRS Memory	Spaced-repetition decay on all knowledge. Stale patterns fade; reinforced ones strengthen.
Evolutionary Archive	SQLite DAG of every agent version. DGM-style parent selection preserves stepping stones.
Crystallizer	Discovers covenants from build traces. Template sweep (Daikon) + Houdini filter + promotion.
Autocatalytic NEW_TOOL	The engine uses its own pipeline to build tools for itself. Failure clusters drive tool goals.
Safety Guardrails	Async overseer, evaluator integrity hashes, Goodhart canary (held-out benchmark), cost monitors.

Key Numbers

Metric	Value
Codebase	131 Python files, ~37,800 lines
Benchmark	17/20 (85%) on 20-challenge suite
Builds completed	53+
Nutrients learned	900+
Self-learned covenants	7 static + dynamic discovery
Cost per build	$0.18 (was $0.87 -- 80% reduction)
Build time	~5 minutes
ChromaDB collections	5 (tools, episodes, principles, failures, covenants)

Quick Start

pip install belief-engine

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Build something
belief --goal "Build a URL shortener with FastAPI and SQLite"

# Build + deploy
belief --goal "Build a REST API" --deploy docker_local --deploy-name myapi

# Run the benchmark
belief benchmark --tiers 1 2 3 4 5

Local-only quick start (v3.1)

No API key, no cloud calls, no per-build cost. Everything runs on your laptop against Ollama. Requires ~16 GB of RAM for the default model.

# One-command setup (installs Ollama, pulls qwen2.5-coder, runs a smoke build):
curl -fsSL https://raw.githubusercontent.com/metafiopy-tech/belief-engine/main/scripts/belief-setup.sh | bash

# Or, step by step:
curl -fsSL https://ollama.ai/install.sh | sh     # one-off
ollama pull qwen2.5-coder:14b                    # ~8 GB download
pip install "belief-engine[full]"

# Point every agent at the local model:
export BELIEF_MODEL_MODE=local
belief --goal "Build a Python script that prints hello world"

Hybrid mode (mix local + Claude) is one env var away — see Adding Claude for hard tasks below.

From Source

git clone https://github.com/metafiopy-tech/belief-engine.git
cd belief-engine
pip install -e ".[dev]"

How the soil compounds over time

Every build deposits knowledge — patterns, antipatterns, skeletons, covenants — into the ChromaDB soil at ~/.belief-engine/soil. The soil is the engine's working memory. Build N is smarter than build N-1 because build N-1 left behind what worked, what didn't, and why.

Decay is FSRS-4.5 spaced repetition with clade-productivity weighting (v3.1): a nutrient's retention is proportional to how often its descendants succeed in later builds. Nutrients whose downstream uses keep working stay sharp; orphans fade. Contradicted nutrients are soft-deleted with a valid_until timestamp, never purged — belief manifold can show the soil as it was on any historical date.

You can watch this happen:

belief dashboard        # metrics: pass rate, cost, nutrients, covenants
belief manifold         # clusters by domain + coverage gaps (v3.1)

Checking progression per vertical

The generative-chain progression tracker (Session 7) scores each of eight verticals independently — fastapi, cli, mcp, data, async, library, script, general — so you can see which domains the engine has matured in and which it hasn't touched yet.

belief progression

Output lists every domain and its current stage (Seed → Cluster → Tessellation → Basis → Connectivity → Archetypes). Domains stuck at Seed are the ones to target with the next round of builds.

Adding Photosynthesis for autonomous goal generation

The Grinder daemon (Session 8) picks goals out of a queue and builds them continuously. The Photosynthesis daemon (Sessions 3–5) populates that queue by harvesting candidate build goals from GitHub, PyPI, HN, Stack Overflow, RSS feeds, and ArXiv, then filtering them through a four-stage cascade (novelty band → ACCEL heap → LLM judge). Together they turn the engine into a self-running research workshop:

# Background the grinder (drains the goal queue):
belief grinder start --max-builds 100

# Photosynthesis lives in its own package extras:
pip install "belief-engine[photosynthesis]"

Adding Claude for hard tasks (hybrid mode)

Hybrid mode routes mechanical agents (intake, tester, synthesizer, validator) to the local model and keeps reasoning agents (research, planner, architect, builder, debugger) on Claude — the same quality ceiling as cloud mode at roughly 1/4 the cost.

export ANTHROPIC_API_KEY=sk-ant-...
export BELIEF_MODEL_MODE=hybrid
belief --goal "Build a distributed task queue with priority lanes"

v3.1 additionally introduces a confidence-probe-gated escalation path: when the Session-10 probe judges the local model unlikely to succeed on a given call (confidence < 0.4), that single call escalates to Claude automatically. Local-first; Claude is only paid for when needed.

CLI Commands

Command	Description
`belief --goal "..."`	Build software from a goal
`belief benchmark`	Run benchmark challenges
`belief sica --iterations N`	Run SICA self-improvement
`belief jitterbug`	Run compression-reconstruction cycle
`belief jitterbug --dry-run`	Expansion + compression only
`belief progression`	Per-domain generative-chain stage
`belief manifold`	Knowledge topology: clusters, cross-links, gaps (v3.1)
`belief manifold --json`	Manifold as machine-readable JSON
`belief optimize [agent]`	DSPy/GEPA prompt optimization
`belief dashboard`	Metrics dashboard
`belief dashboard --json`	Metrics as JSON
`belief library`	Named library of promoted tools (v3.0)
`belief grinder start`	Autonomous build loop
`belief models`	Show active model routing table
`belief fix --repo PATH --issue "..."`	Fix an issue in existing code

Architecture

belief/
  agents/          -- 11+ LangGraph agents (intake -> validator)
  validators/      -- AST covenant enforcers + dynamic covenant registry
  memory/          -- ChromaDB metabolization (5 collections, FSRS decay)
  refinement/      -- Water cycle (analyze -> fix -> revalidate)
  evolution/       -- SICA, archive, crystallizer, jitterbug, progression
  optimization/    -- DSPy/GEPA prompt optimization (optional)
  safety/          -- Overseer, probes, Goodhart canary
  metrics/         -- Dashboard, growth analysis
  deploy/          -- Docker + Railway deployment
  codebase/        -- Brownfield support (localization, patcher)
  languages/       -- Multi-language adapters (Python, TypeScript)
  polarity/        -- Latios/Latias incompleteness engine
  models/          -- Pydantic models (state, artifacts, skeleton, contracts)
  hardening.py     -- Budget limits, rate limiter, security scanner, audit log
  graph.py         -- LangGraph pipeline wiring
  llm.py           -- Anthropic API client with prompt caching + JSON repair

Model Routing

Agent	Model	Role
Research, Planner, Architect, Builder, Debugger	Sonnet 4.6	Deep reasoning
Intake, Tester, Gap Analyst, Synthesizer, Validator, Latios	Haiku 4.5	Mechanical tasks
Skeleton, Covenant Enforcer, Import Fix, Validator core	None	Deterministic (zero tokens)

Prompt caching provides 90% savings on repeated system prompts. Combined with Haiku routing, builds cost $0.15-0.25.

Tech Stack

Python 3.11+ (tested on 3.14)
LangGraph for agent orchestration
Anthropic Claude (Sonnet 4.6 + Haiku 4.5)
ChromaDB for learning memory (5 collections with FSRS)
SQLite for evolutionary archive
Docker for deployment
DSPy (optional) for prompt optimization

License

MIT

Author

Built by Fio -- solo, from scratch, while making pizzas.

"The remainder after every operation drives the next cycle."

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.2.0

Apr 30, 2026

2.6.0

Apr 15, 2026

2.5.0

Apr 14, 2026

2.4.0

Apr 13, 2026

2.3.0

Apr 9, 2026

2.2.1

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

belief_engine-3.2.0.tar.gz (727.3 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

belief_engine-3.2.0-py3-none-any.whl (715.6 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file belief_engine-3.2.0.tar.gz.

File metadata

Download URL: belief_engine-3.2.0.tar.gz
Upload date: Apr 30, 2026
Size: 727.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for belief_engine-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`33e8de7600c9b5edb6e0023f9c36a302b6c1782c82dcfdc7eb55dea33f5526d5`
MD5	`9ee01587e88efb71cfe3aebcc42a2211`
BLAKE2b-256	`61f35e90f30e72fff51c196c8f38e76c0da1a46b34932901d0fb294693e33b24`

See more details on using hashes here.

File details

Details for the file belief_engine-3.2.0-py3-none-any.whl.

File metadata

Download URL: belief_engine-3.2.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 715.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for belief_engine-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddb9aeebf4cecdd0b0a0afc7087d84211f2628873978229fb76d3c698d903d73`
MD5	`2ffc1297e9ad09e5ca69e7089b75953f`
BLAKE2b-256	`e3f3c05b759a5ae7484f485a052e52270c1befaa759bacd99e4402ceaa5e4a54`

See more details on using hashes here.

belief-engine 3.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Belief Engine

Benchmark: 85% Pass Rate

Validation: Does accumulated knowledge help a local model?

How It Works

v3.0: Autocatalytic Self-Improvement

Key Numbers

Quick Start

Local-only quick start (v3.1)

From Source

How the soil compounds over time

Checking progression per vertical

Adding Photosynthesis for autonomous goal generation

Adding Claude for hard tasks (hybrid mode)

CLI Commands

Architecture

Model Routing

Tech Stack

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes