Skip to main content

Graph-only Mini-CPG scanner and query server for codebases

Project description

Demiourgos Backend

The Structural Memory Engine for AI Coding Agents

Demiourgos is a deterministic code analysis engine that builds a real-time graph of your entire codebase — every function, class, API route, and database model — and tracks exactly how they connect. When any file changes, it instantly computes the structural blast radius: which functions will break, which API routes are affected, and which execution flows are disrupted.

No LLM is used for code parsing. The analysis is mathematically exact.

[!NOTE] Living Document: This README outlines our current architectural logic and intended trajectory. These are not hard-and-fast rules—we are actively building this engine and will iterate, refine, and change structural approaches wherever we see positive results during development.


Table of Contents

  1. The Problem We Are Solving
  2. The Solution: Our Own Agent with Hive
  3. The Universal Graph Architecture
  4. The Processing Pipeline
  5. Impact Scoring System
  6. How Cohesion Scores Contribute to Impact
  7. Context Slicing: When Pruning Works and When It Does Not
  8. Route Dependency Tracking for API Testing
  9. The Worker-Judge Loop
  10. Coding Style Preservation
  11. Install and Run
  12. Configuration

1. The Problem We Are Solving

AI coding agents (Claude, Cursor, Windsurf, Codex) are excellent at writing code. But they are blind to architecture. They do not understand, structurally, how the functions in your codebase depend on each other, which database models feed which API routes, or which API must be called before another one works.

We are building an agent that fixes this. Demiourgos gives our agent (Hive) a structural memory layer that no other AI coding agent has. Below are the exact problems we are solving:

This causes four critical failures in every existing AI agent today:

1.1 The Blast Radius Problem

This goes far beyond "IDE tells you a function has errors." An IDE can catch a syntax error in the same file. But it cannot tell you that renaming the user_id column in your User database model will break:

  • The get_user() function that queries User.user_id
  • The create_order() function that takes a user_id parameter from get_user()'s return value
  • The media_upload() function that stores user_id as a foreign key in the Media table
  • The GET /users/{id} API route that calls get_user() and returns the user_id field
  • The POST /orders API route that calls create_order() which internally joins on user_id

That is 5 breakages across 4 files, 2 database models, and 3 API routes — from a single column rename. The IDE sees none of this. The AI sees none of this. Only the graph sees it, because it tracks the edge from DataModel(User)READS_FROMFunction(get_user)CALLSFunction(create_order)READS_FROMDataModel(Media) → all the way up to Route(POST /orders) via HANDLED_BY.

1.2 The API Interdependency Problem

AI agents are notoriously bad at calling APIs correctly and writing integration tests. The reason is that APIs have hidden interdependencies — a chain of calls where one must succeed before the next one works.

Consider a real-world flow:

Step 1: POST /auth/login     → returns access_token
Step 2: GET  /users/me        → requires access_token in header → returns user_id
Step 3: POST /media/upload    → requires access_token AND user_id in body → returns media_id
Step 4: POST /orders          → requires access_token AND user_id AND media_id → creates order

An AI agent that tries to test POST /orders directly will get a 401 Unauthorized because it does not know that /auth/login must be called first. Even if it remembers the auth step, it does not know that user_id comes from GET /users/me and media_id comes from POST /media/upload. It will invent fake values and get a 422 Unprocessable Entity.

Demiourgos solves this because the graph stores each Route node and traces the data flow between them. It knows:

  • POST /orders handler calls validate_order(user_id, media_id)
  • user_id is tainted with origin GET /users/me → response.id
  • media_id is tainted with origin POST /media/upload → response.media_id
  • Both require access_token from POST /auth/login → response.token

The AI agent can now query the graph and get the exact API call chain with the correct parameter names and sources.

1.3 The Context Window Waste Problem

When an AI agent needs context about a function, it pulls in the entire function body and all related functions. But in a real codebase, functions are not short. A single controller function might be 300 lines long. The AI pulls in the full 300 lines even when only 12 lines (the specific branch that handles the user lookup) are relevant to the change.

Worse, related functions add up. If a function calls 8 other functions, each averaging 150 lines, the AI typically pulls in 1500 lines of context. But not all of those functions actually matter to the change.

Demiourgos serves a context slice that strips out the noise. First, it uses the graph and taint tracking to identify only the specific downstream and upstream functions that are actually affected by the change — ignoring the rest entirely. Then, even within those affected functions, it prunes irrelevant branches and error handlers with [lines X-Y pruned] markers. It delivers 47 highly relevant lines instead of 1500 blind ones.

1.4 The Runtime Logic Break Problem

Structural breaks (argument count mismatch) are easy to detect. But what about logic breaks that only appear at runtime?

Example:

# Before: returns a list
def get_users():
    return db.query(User).all()   # returns [User, User, User]

# After: returns a generator (same signature, same type hint)
def get_users():
    yield from db.query(User).all()   # returns generator object

The function signature did not change. No parameter was added or removed. But every caller doing len(get_users()) will now crash because generators do not support len(). This is a logic break that traditional static analysis misses.

Demiourgos catches this through taint tracking. It traces the return value of get_users() into every downstream variable that uses it. When the body changes (Soft Impact 0.1), the AI is alerted to review the tainted chain, and the context slicer serves exactly the affected lines.

1.5 The Missing Layers Problem

Even if you solve blast radius and API testing at the code level, there is still a gap: business intent. Why does this code exist? What requirement does it fulfill? What architectural decision constrains how it can be changed?

Demiourgos addresses this through a 4-layer architecture where each layer is connected to the others via a graph-of-graphs:

Layer Name What It Tracks Example
Layer 1 Business Layer (PRDG) PRD sections, user stories, requirements "Users must be able to securely process checkouts"
Layer 2 Capability Layer (FDG) Features, feature dependencies "Checkout Feature depends on Stripe API"
Layer 3 Reasoning Layer (ADR/FKG) Architecture Decision Records, constraints "ADR-012: Payment processing must run synchronously to prevent split-brain UI states"
Layer 4 Structural Layer (CPG/RDG/MDG) Code graph, route graph, database models Functions, API routes, database tables, and their edges

Top-Down Execution: The Architect Workflow

A critical distinction in Demiourgos is who writes which layer:

  • Layer 4 (Structural) is generated natively by deterministic parsers (like Tree-sitter). The user never writes it directly, and an AI agent can never hallucinate it. The code is the truth.
  • Layers 1, 2, and 3 are managed by a specialized Architect Agent collaborating directly with the User, controlled and continuously improved by the orchestration of Hive.

When a user demands a new feature, they do not just dump a massive prompt into a coding worker. The top-down flow works like this:

  1. User Story: The User tells the Architect Agent: "I need a synchronous checkout feature."
  2. Evaluation: The Architect evaluates the request against current implementations (Layer 4) and existing constraints (Layer 3). It points out unbreakable breakpoints: "We currently process payments asynchronously. Making this synchronous will require rewriting the Stripe webhook handler without breaking the existing refund flow."
  3. PRD & Plan: The User and Architect discuss the trade-offs. Once agreed, the Architect writes the PRD (Layer 1), updates the Architecture constraints (Layer 3), and finalizes an execution plan.
  4. Execution: The Architect dispatches Worker Agents. The Workers use the finalized constraints and the exact code map to implement the feature perfectly safely.
  5. Staging & Approval: The Workers finish, run tests, and generate a final report with a deployed staging site. The User reviews the staging site, approves it, and the code goes to Production.
sequenceDiagram
    actor User
    participant Arch as "Architect Agent"
    participant L3 as "Layer 3 (Constraints)"
    participant L4 as "Layer 4 (Code Truth)"
    participant Worker as "Worker Agents"
    participant Stage as "Staging/Prod"

    User->>Arch: "I need synchronous checkouts" (User Story)
    Arch->>L4: Analyzes current code graph
    Arch->>L3: Checks existing architecture rules
    L4-->>Arch: Finds async Stripe webhook dependencies
    Arch->>User: "This breaks the existing webhook logic. Propose rewriting it?"
    User->>Arch: "Yes, get the PRD right and let's plan it."
    Arch->>L3: Finalizes PRD and ADRs
    Arch->>Worker: Dispatches implementation tasks
    Worker->>L4: Safely writes code using L4 context
    Worker->>Stage: Deploys to Staging Site
    Stage-->>User: Provides Staging URL & Impact Report
    User->>Stage: Approves for Production

Bottom-Up Impact & Stale Nodes

The graph works bottom-up as well: Code (L4) → Features (L2) → Business Requirements (L1).

However, this bottom-up tracing does not run blind on every small keystroke. If an agent or a developer makes a crazy amount of tiny edits, tracing the impact all the way up to "Business Intent" on every save would overwhelm the user and burn massive compute budgets.

But leaving those upper layers un-updated makes the graph stale. Demiourgos solves this through Impact-Based Model Routing:

  • Real-time (L4 only): As code is edited, only local structural impact (L4) is computed by the deterministic parsers. This is instant and costs zero tokens.
  • Micro-Syncs (Small Models): If the structural impact is small (e.g., adding a simple created_at field), Hive dispatches a fast, cheap model (like Mistral or Claude Haiku) to quickly read the L4 diff and quietly update the associated Feature nodes (L2) to prevent them from going stale.
  • Deep Checkpoints (Large Models): When a Worker finishes a major task, or a Pull Request introduces large cross-module changes, Hive dispatches a heavy reasoning model (GPT-4o or Claude 3.5 Sonnet) to do a deep bottom-up trace. It maps the cumulative changes upward: "Your changes to auth.py successfully implemented the Sync Checkout feature, but accidentally impacted the Guest Refund user story."

Finding the Sweet Spot: We are actively building the mathematics to define the exact thresholds for these checkpoints. Do we trigger a Micro-Sync on every "File Save"? Do we run a Deep Checkpoint on every "Git Push", or only when a PR is "Approved"? Finding the exact sweet spot between graph freshness, user interruption, and compute cost is the ultimate goal of Hive's orchestration layer — and solving this balance is what changes everything.

Demiourgos solves all of these problems by building a deterministic, multi-layer graph that tracks every connection from business requirement to database column, scoring every change, and powering our own autonomous agent system — Hive.


2. The Solution: Our Own Agent with Hive

Demiourgos is not just an analysis engine. It is the structural backbone of Hive — our own autonomous AI coding agent.

Hive is an orchestration layer that coordinates multiple AI models, tools, and memory systems to perform complex coding tasks safely. Demiourgos is the "nervous system" that gives Hive structural awareness.

2.1 The Four Components

graph TB
    subgraph Hive ["Hive — The Orchestrator"]
        direction TB
        ORCH["Control Loop<br/>Task planning, routing, budget"]
    end

    subgraph Brain ["The Brain — Reasoning"]
        LLM["LLM (Claude / GPT-4o)"]
        PLAN["Task decomposition"]
        CODE["Code generation"]
        REASON["Impact reasoning"]
    end

    subgraph Hands ["The Hands — Action"]
        FS["File system I/O"]
        AST["Tree-sitter parsing"]
        GRAPH["FalkorDB writes"]
        SHELL["Shell commands"]
        API["External API calls"]
    end

    subgraph Memory ["The Memory — Alloy Net"]
        L1["Layer 1: PRDG (Business)"]
        L2["Layer 2: FDG (Capability)"]
        L3["Layer 3: ADR/FKG (Reasoning)"]
        L4["Layer 4: CPG/RDG/MDG (Structural Code)"]
    end

    ORCH --> Brain
    ORCH --> Hands
    ORCH --> Memory
    Brain --> Hands
    Memory --> Brain

    style Hive fill:#1a1a1a,stroke:#fff,color:#fff
    style Brain fill:#1a1a1a,stroke:#e040fb,color:#e040fb
    style Hands fill:#1a1a1a,stroke:#00bcd4,color:#00bcd4
    style Memory fill:#1a1a1a,stroke:#ffc107,color:#ffc107

Hive (The Orchestrator): The nervous system. It receives a task from the user, decomposes it into subtasks, decides which AI model to use, manages the token budget, and routes work between the Brain, Hands, and Memory.

The Brain (Reasoning): The LLM (Claude, GPT-4o, Mistral). It plans tasks, writes code, reasons about impact, and generates human-readable summaries. The Brain never touches files directly — it always works through the Hands.

The Hands (Action): The tool layer. File system operations, Tree-sitter AST parsing, FalkorDB graph writes, shell commands, and external API calls. Every action is logged and reversible.

The Memory (Alloy Net): The fundamental structural graph of the system, acting as long-term context storage across four interconnected layers:

Layer Name Purpose Example
Layer 1 PRDG (Business) Requirements, PRDs, and User Stories "User can upload photo"
Layer 2 FDG (Capability) Features and Product rollouts "Photo Upload Feature"
Layer 3 ADR (Reasoning) Architectural Decisions & Constraints "Must process uploads asynchronously"
Layer 4 Structural (Code) Deterministic code graph (CPG/RDG/MDG) routes.py calls upload()

(In addition to Alloy Net's structural truth, Hive also maintains a secondary Semantic Memory for team preferences like "always use camelCase", and an Ephemeral Memory for the active task's scratchpad).

2.2 How Hive Handles Multiple Agents

Hive can run multiple Brain instances in parallel for large tasks:

  1. Task Decomposition: Hive breaks a large task (e.g., "Refactor the authentication system") into independent subtasks ("Update password hashing," "Migrate session tokens," "Update API routes").

  2. Conflict Detection via Graph: Before dispatching subtasks to parallel agents, Hive queries the Demiourgos graph to check if any subtasks share dependencies. If Agent A is editing validate_user() and Agent B is editing create_session() which calls validate_user(), Hive detects the conflict and serializes those two tasks.

  3. Graph Lock Regions: Each agent "locks" the subgraph it is working on. Other agents can read it, but cannot write to overlapping nodes. This prevents merge conflicts at the architectural level.

  4. Result Merging: After parallel agents finish, Hive runs a graph-level merge check. It re-scans all modified files, recomputes impact scores, and verifies that no cross-agent Hard Impacts (1.0) were introduced.

2.3 How Hive Keeps Improving

Every task that Hive completes generates a feedback signal:

  • Judge Pass Rate: How often does the Worker's output pass the Judge on the first try? A low pass rate means the Brain needs better context or the task decomposition is too coarse.
  • Impact Prediction Accuracy: After a change is deployed, did the predicted impact (1.0 / 0.5 / 0.1) match reality? False positives mean our diffing is too aggressive. False negatives mean we missed a dependency.
  • Context Slice Effectiveness: Did the AI agent use all the lines in the context slice, or did it ignore half of them? Ignored lines mean our pruning is too generous.

These signals are stored in Semantic Memory and used to tune future task planning, context depth, and Judge strictness.


3. The Universal Graph Architecture

Instead of maintaining separate disconnected graphs that drift out of sync, Demiourgos merges three structural dimensions into one unified Property Graph stored in FalkorDB (a Redis-native graph database).

(Note: We conceptually group these nodes into three "Dimensions" — CPG, RDG, MDG — for easier understanding, but physically they all live together in the exact same database. There are no hard boundaries between them.)

graph TB
    subgraph DimA ["Dimension A: Code Property Graph (CPG)"]
        M["Module: routes.py"]
        F1["Function: get_user"]
        F2["Function: create_user"]
        C1["Class: UserService"]
        C2["Class: BaseService"]
        M -->|CONTAINS| F1
        M -->|CONTAINS| F2
        M -->|CONTAINS| C1
        F1 -->|CALLS| F2
        C1 -->|EXTENDS| C2
    end

    subgraph DimB ["Dimension B: Route Dependency Graph (RDG)"]
        R1["Route: GET /users/id"]
        R2["Route: POST /users"]
    end

    subgraph DimC ["Dimension C: Model Dependency Graph (MDG)"]
        DM1["DataModel: User Table"]
    end

    R1 -->|HANDLED_BY| F1
    R2 -->|HANDLED_BY| F2
    F1 -->|READS_FROM| DM1
    F2 -->|WRITES_TO| DM1

    style R1 fill:#4caf50,color:#000
    style R2 fill:#4caf50,color:#000
    style DM1 fill:#ffc107,color:#000
    style M fill:#6366f1,color:#fff
    style F1 fill:#e040fb,color:#000
    style F2 fill:#e040fb,color:#000
    style C1 fill:#00bcd4,color:#000
    style C2 fill:#00bcd4,color:#000

Dimension A — CPG (Code Property Graph)

The foundation. Tree-sitter parses the raw source code and extracts:

  • Module nodes — one per source file
  • Function nodes — every function and method, including overloads
  • Class nodes — every class with inheritance chains
  • Symbol nodes — variables and constants
  • CALLS edges — which function calls which function
  • CONTAINS edges — which module owns which function
  • IMPORTS edges — which module depends on which module
  • EXTENDS edges — which class inherits from which class

This layer is 100% deterministic. No LLM is involved. Tree-sitter is a C-based parser that produces a mathematically exact Abstract Syntax Tree (AST).

Dimension B — RDG (Route Dependency Graph)

Framework plugins (like the FastAPI extractor) scan the AST for HTTP route decorators:

@app.get("/users/{user_id}")
def get_user(user_id: str):
    return db.query(User).filter(User.id == user_id).first()

The plugin creates a Route node (GET /users/{user_id}) and connects it to get_user() via a HANDLED_BY edge.

Why this matters: The AI agent can now trace from any internal function up to the API route that exposes it. If a function breaks, the AI knows exactly which HTTP endpoint is affected.

Dimension C — MDG (Model Dependency Graph)

ORM plugins (like the SQLAlchemy extractor) scan for database model definitions and data access patterns:

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    email = Column(String)

def get_user(user_id):
    return db.query(User).filter(User.id == user_id).first()  # READS_FROM User

The plugin creates a DataModel node (User table) and connects get_user to it via a READS_FROM edge.

Why this matters: If a database column is renamed or deleted, the AI can trace the graph from the DataModel through all functions that read/write it, and up to the API routes that depend on those functions.

The Power of One Unified Graph

Because all three dimensions live in a single FalkorDB instance, a single Cypher query can cross all dimensions:

-- "If I rename the 'email' column, which API routes break?"
MATCH (dm:DataModel {name: "User"})<-[:READS_FROM|WRITES_TO]-(f:Function)
MATCH (f)<-[:CALLS*0..5]-(upstream:Function)
MATCH (r:Route)-[:HANDLED_BY]->(upstream)
RETURN r.name AS affected_route, f.name AS via_function

This returns the exact list of API routes that will break. No separate tool, no manual tracing, no guesswork.


4. The Processing Pipeline

When a developer (or an AI agent) saves a file, Demiourgos runs the following pipeline:

flowchart LR
    A["File Save<br/>Detected"] --> B["AST Parse<br/>(Tree-sitter)"]
    B --> C["Plugin Pass<br/>(FastAPI, SQLAlchemy, Taint)"]
    C --> D["Graph Diff<br/>(FalkorDB Update)"]
    D --> E["Impact Score<br/>(1.0 / 0.5 / 0.1)"]
    E --> F["Context Slice<br/>(Taint-Pruned Output)"]

    style A fill:#6366f1,color:#fff
    style B fill:#4caf50,color:#000
    style C fill:#00bcd4,color:#000
    style D fill:#e040fb,color:#000
    style E fill:#ffc107,color:#000
    style F fill:#ef5350,color:#fff

Step 1: File Save Detected

The OS-native file watcher (watchdog, FSEvents on macOS, inotify on Linux) detects the file change instantly. No polling. Zero CPU overhead when nothing changes.

File: watcher.py

Step 2: AST Parse (Tree-sitter)

The changed file is parsed by Tree-sitter. The result is a ParsedModule containing all definitions (functions, classes), calls, imports, and identifiers found in that file.

Each file is hashed with SHA-256. If the hash matches the previous scan, the file is skipped entirely. This is the CDC (Change Data Capture) efficiency — only changed files are reprocessed.

File: parser_adapters.py

Supported Languages: Python, TypeScript, JavaScript

Step 3: Plugin Pass (The Translation Layer)

After the universal AST parse, a set of framework-specific extractors are run. This is the translation layer.

Codebases are messy because every framework (FastAPI, Django, Express, Spring) uses different syntax for the exact same architectural concepts. The job of the plugins is to translate framework-specific syntax into a universal, framework-agnostic graph structure (e.g., RouteHANDLED_BYFunctionREADS_FROMDataModel).

Because the core Demiourgos engine (Diffing, Taint Tracking, Impact Scoring) only consumes this universal structure, the engine itself never needs to know what framework your code is written in.

Plugin What It Detects (Framework Specific) Translated To (Universal Structure)
FastAPI Extractor @app.get(), @router.post() decorators Creates Route nodes and HANDLED_BY edges
SQLAlchemy Extractor ORM model classes, db.query() calls Creates DataModel nodes, READS / WRITES edges
Taint Tracker Variable assignments and data flow Attaches origins data to arguments on CALLS edges

This guarantees extreme extensibility. Without touching the internals of how Demiourgos scores impact or traces data flow, you can add support for a completely new framework (like Django, Express, or Prisma). You simply drop a new extractor.py file into the plugins/ directory that translates the framework's AST syntax into the universal nodes and edges. Zero internal wiring required.

Directory: plugins/

Step 4: Graph Diff (FalkorDB Update)

The old subgraph for the changed file is deleted from FalkorDB. The new subgraph is written atomically. Only the changed file is touched. Hundreds of unchanged files are never reprocessed.

File: graph_store.py

Step 5: Impact Score

The diffing engine compares the old and new versions of each function and computes a structural impact score:

File: diffing.py

(See Section 5 for the full scoring system.)

Step 6: Context Slice

The context slicer packages up only the affected downstream code and feeds it to the AI agent. It uses taint analysis to include only the lines that actually matter.

File: context_slicer.py

(See Section 7 for the full slicing system.)


5. Impact Scoring System

Every time a function changes, Demiourgos scores the impact on every caller. The score tells the AI (or developer) exactly how dangerous the change is.

Score Tiers

Score Level Trigger What It Means
1.0 Hard Impact Argument count changed, parameter deleted, return type changed Callers will crash at runtime. The function signature contract is broken.
0.5 Medium Impact New optional parameter added, type annotation changed Callers might need updating. The contract shifted but is not broken.
0.1 Soft Impact Logic body changed, internal variable renamed Callers are structurally unaffected. The output behavior may have changed.

How Scoring Works

The diffing engine (diffing.py) compares the old and new function structurally:

  1. Argument count comparison: If the old function had 3 parameters and the new one has 4 (without a default value), that is a Hard Impact (1.0). Every caller passing 3 arguments will now fail.
  2. Type annotation diff: If a parameter type changed from str to int, that is a Medium Impact (0.5). The function still accepts the same number of arguments, but the type contract changed.
  3. Body hash comparison: If only the internal logic changed (the function still has the same signature), that is a Soft Impact (0.1).

Taint-Based Data Flow Tracking

Beyond structural diffing, the Taint Tracker traces the actual data flowing between functions.

Example:

def caller():
    x = db.get_user()        # x is tainted with origin "db.get_user()"
    y = x                    # y inherits the taint
    process(y)               # CALLS edge records: argument[0].origins = ["db.get_user()"]

This means if db.get_user() changes its return type, Demiourgos knows that process() is affected — not just because of a generic CALLS edge, but because it traces the exact variable (y) that carries the tainted data.

The taint origins are stored directly on the CALLS edge as JSON:

{
  "arguments": [
    { "value": "y", "position": 0, "origins": ["db.get_user()"] }
  ]
}

6. How Cohesion Scores Contribute to Impact

When Leiden community clustering (Phase 4) is active, the cohesion score of each cluster adds a dimension to impact analysis that raw scoring alone cannot provide.

What Cohesion Measures

Cohesion is the ratio of actual internal edges to the maximum possible internal edges within a community:

cohesion = (actual internal edges) / (max possible internal edges)
  • 1.0 — Every function in the cluster calls every other function. Very tightly coupled.
  • 0.7+ — Strong functional grouping. A change inside this cluster is likely to stay inside it.
  • 0.3-0.7 — Moderate grouping. Some functions are loosely connected.
  • Below 0.3 — Weak grouping. The cluster may be an artifact of the algorithm rather than a true functional unit.

How Cohesion Changes the Impact Story

Scenario Without Cohesion With Cohesion
Change in a high-cohesion (0.87) cluster "14 functions impacted" "14 functions impacted, BUT all are in the Auth cluster (cohesion 0.87). The blast radius is self-contained. Low cross-system risk."
Change in a low-cohesion (0.3) cluster "14 functions impacted" "14 functions impacted across a LOOSELY GROUPED cluster (cohesion 0.3). These functions may not actually be related. Higher investigation risk."
Change that crosses 3 clusters "47 functions impacted" "47 functions across 3 clusters: Auth (0.87), Payment (0.72), Logging (0.41). The change breaks cluster boundaries. HIGH cross-system risk."

Cohesion as a Priority Signal

The AI agent uses cohesion to prioritize its repair work:

  1. High-cohesion cluster with impact: Fix it first. The cluster is tightly coupled, so fixing one function likely fixes the whole cluster.
  2. Low-cohesion cluster with impact: Investigate carefully. The functions may need individual fixes.
  3. Cross-cluster impact: This is the highest risk. A change rippling across cluster boundaries often means a fundamental architectural decision has changed.

7. Context Slicing: When Pruning Works and When It Does Not

Context slicing is the process of extracting only the relevant lines of code for the AI agent. It works by tracing the graph from a target function, collecting all related functions, and pruning non-relevant lines from each source file.

How Pruning Works

  1. Graph traversal: Starting from the target function, the slicer walks the CALLS graph outward up to a configurable hop depth.
  2. AST Extraction: For each related function discovered, it pulls the complete Abstract Syntax Tree (AST) for that function.

The Density & Pruning Algorithm

Once the slicer isolates a function, it doesn't just dump the whole body into the context. It runs a density-based pruning algorithm:

  1. Mark Anchor Nodes: The slicer traverses the function's internal AST and flags "anchor nodes". An anchor is any line that:
    • Contains a call to another function in the traversal graph.
    • Modifies or reads a variable tainted by the diff.
    • Is part of the function signature, return statement, or yield.
  2. Expand Context Windows (The k Radius): For every anchor node, the algorithm flags k lines above and k lines below it to preserve immediate local context (typically k=2).
  3. Merge Overlapping Windows: If two anchor windows overlap, they are merged into one continuous block.
  4. Calculate Function Density: Once all blocks are merged, the slicer computes the overall Pruning Density Score for the function (kept_lines / total_lines). It then applies strict mathematical rules to decide if pruning is actually worth it:
    • Density > 0.7 AND Function < 50 lines: The slicer aborts pruning and serves the full function. Why? Because hiding 8 lines in a 40-line function saves almost no tokens, but replaces them with ugly [lines X-Y pruned] markers that confuse the LLM (fragmentation penalty > token savings).
    • Density > 0.7 AND Function >= 50 lines: The slicer prunes. Even high-density sections in massive 500-line functions are worth pruning to save isolated chunks of 30-40 lines of noise.
    • Density <= 0.7: The slicer always prunes. The signal-to-noise ratio is poor, and pruning will yield massive token savings.
  5. Ghost Declarations: If a variable is needed in a kept section, but its original assignment falls into a pruned section, the slicer resurrects it as a "ghost declaration" (e.g., user_id = ... # [pruned logic]) so the AI understands the binding without needing the bloated code.

When Pruning Works Well

Situation Why Pruning Succeeds
Well-structured code with small functions Each function is self-contained. The graph cleanly identifies which functions matter. Pruning removes everything else.
Clear module boundaries Functions in separate files with explicit imports. The graph has clean IMPORTS edges to follow.
Typed function signatures Type annotations give the slicer confidence about data flow. It knows exactly which variables carry tainted data.
Framework-detected routes and models The Route and DataModel nodes give extra entry/exit points for graph traversal, so the slicer captures the full chain from API to database.
Shallow call chains (2-3 hops) The slicer captures the full context without pulling in hundreds of irrelevant functions.

When Pruning Does Not Work Well

Situation Why Pruning Struggles Mitigation
Giant monolithic functions (500+ lines) The entire function is one range. Pruning cannot remove anything inside it because the AST treats it as a single unit. Break large functions into smaller ones. The graph naturally benefits.
Heavy use of dynamic dispatch (getattr, eval, exec) Tree-sitter cannot see the call target. The graph has no edge to follow. The slicer misses the dependency. The slicer falls back to including the entire file when it detects dynamic dispatch patterns.
String-based queries (raw SQL, raw HTTP calls) db.execute("SELECT * FROM users") bypasses the ORM. The SQLAlchemy plugin cannot detect it. No READS_FROM edge is created. Use ORM methods. Future plugins may parse SQL strings.
Global mutable state A function modifies a global dictionary. Another function reads it. There is no CALLS edge between them — they communicate through side effects. Taint tracking partially helps by tracing variable assignments, but cross-function globals are a known limitation.
Deeply nested call chains (10+ hops) The slicer pulls in too many functions. The context slice becomes larger than the original file. Configure max_hops to limit traversal depth. Phase 4 clustering helps by summarizing distant impacts at the cluster level.
Decorators and metaclasses Heavy decorator wrapping can hide the real function from Tree-sitter. The AST sees the wrapper, not the inner function. The parser has specific handling for common decorators (@app.get, @staticmethod). Custom decorators may need plugin support.

Slice Modes

When the AI agent requests context via the MCP demiourgos_context tool, it can specify a slice mode:

Mode What Is Included Use Case
full The complete function body and all related functions Deep investigation of complex logic
skeleton Only function signatures and docstrings Quick architectural overview
auto Adapts based on function complexity and graph confidence Default for AI workflow
custom User-defined line ranges Specific debugging

Token Savings and LLM Attention Theory

Why go through the trouble of AST-level pruning and taint-tracking? Because LLMs suffer from "Lost in the Middle" syndrome.

A Real-World Pruning Example

Imagine a 2,000-line payment_service.py file containing 14 functions. A developer changes a database column used by the charge_stripe() function. The AI agent needs to fix the downstream function process_checkout().

The process_checkout() function is 300 lines long, but the charge_stripe() call only happens inside one specific if branch.

Without Slicing (Current AI Agents): The agent consumes the entire 2,000-line payment_service.py file.

  1. Token Cost: ~15,000 tokens per prompt.
  2. Attention Dilution: Extreme. The LLM must read 13 unrelated functions and 290 unrelated lines of process_checkout() just to find the 10 lines that matter.
  3. Hallucination Risk: High. The LLM might accidentally rewrite an unrelated branch simply because it was present in the context.

With Demiourgos Density Slicing:

  1. The Slicer isolates the process_checkout() function (drops the other 13 functions immediately).
  2. It finds the charge_stripe() anchor node inside the AST.
  3. It keeps the function signature, the anchor line, and a 2-line radius around the anchor.
  4. It prunes the remaining 285 lines of process_checkout(), replacing them with [lines X-Y pruned].
  • Tokens consumed: ~250 tokens (98% savings).
  • Amplifies signal-to-noise: Mathematically forces the LLM's attention mechanism to focus 100% of its weights on the tainted data flow.
  • Prevents collateral damage: The AI cannot break unrelated logic branches because it cannot even see them.

Expected Token Savings by Function Profile

How much context do we actually save? It depends heavily on the code profile:

Code Profile Slice Behavior Token Savings AI Performance Impact
Giant Monolithic Functions (500+ line procedural scripts) Heavy pruning. Only specific branches containing anchors are kept. ~85% to 95% Massive. Completely eliminates the "Lost in the Middle" syndrome.
Object-Oriented Classes (Classes with many small methods) Medium pruning. Keeps the class definition and only the specific methods touched. ~70% to 80% Very High. AI sees the class interface without the implementation noise of other methods.
Utility Modules (Many tiny, 10-line pure functions) Light pruning. High density aborts pruning (fragmentation > savings). Entire functions sent. ~40% to 60% Moderate. Savings come entirely from dropping the other utility functions in the file.
God Files (10,000+ line legacy files) Extreme pruning. > 98% Critical. Makes it actually possible to use AI on legacy codebases without maxing out context limits.

8. Route Dependency Tracking for API Testing

This section explains the specific problem of AI agents failing at API testing, and how Demiourgos solves it.

Why AI Fails at API Testing Today

When an AI agent needs to test an API endpoint, it typically struggles in two ways:

  1. Payload Guessing (422 Errors):

    • It reads the route handler, but the expected JSON body structure is defined in a Pydantic UserCreate model imported from a completely different file.
    • The AI guesses the payload and sends {"email": "x"} instead of the required nested structure {"user": {"email": "x"}}. It gets a 422 Unprocessable Entity and burns tokens looping to fix it.
  2. Blind Dependency Breaks (500 Errors):

    • It successfully calls the route, but gets a 500 error because a downstream dependency changed.

    • The AI cannot see that 4 hops down, serialize_email() crashed because a database column was silently renamed.

How Demiourgos Fixes This

Because the Framework Extractors (like the FastAPI plugin) also map out the exact request schemas and their Pydantic/Zod dependencies, the AI has perfectly structured knowledge of both what goes into the route, and what happens after it.

The AI agent can query exactly what object needs to be sent:

"What is the required payload and full dependency chain for POST /users?"

Demiourgos returns:

Route: POST /users
  ├─ Schema (Expects):
  │    └─ UserCreate { user: dict(email: str, is_active: bool) }
  └─ Handler: create_user(payload: UserCreate)
       └─ WRITES_TO: User (columns: id, email, created_at)
       └─ CALLS: send_welcome_email(payload.user.email)  ← HARD IMPACT 1.0
            Reason: User.email column no longer exists (renamed to email_address)

The AI now knows:

  1. Exactly how to build the JSON request payload without guessing.
  2. The complete execution chain from route to database.
  3. Precisely which function to fix and what the new column name is.

How a User Can Steer This

The graph works in both directions:

Top-Down (Route → Database):

MATCH (r:Route {name: "GET /users/{id}"})-[:HANDLED_BY]->(f:Function)
MATCH path = (f)-[:CALLS*]->(downstream:Function)
MATCH (downstream)-[:READS_FROM|WRITES_TO]->(dm:DataModel)
RETURN path, dm.name

"Starting from this API route, show me every database table it touches."

Bottom-Up (Database → Route):

MATCH (dm:DataModel {name: "User"})<-[:READS_FROM|WRITES_TO]-(f:Function)
MATCH path = (f)<-[:CALLS*]-(upstream:Function)
MATCH (r:Route)-[:HANDLED_BY]->(upstream)
RETURN r.name, path

"Starting from this database table, show me every API route that depends on it."

Lateral (Function → Function):

MATCH (f:Function {name: "validate_user"})<-[:CALLS]-(caller:Function)
MATCH (caller)-[:CALLS]->(sibling:Function)
RETURN sibling.name

"What other functions does the caller of validate_user also call?" — useful for understanding the broader context of a change.

Layer 4 Deep Code Structure (AST Pattern Matching): Because Layer 4 breaks code down into its literal Abstract Syntax Tree components (TryCatch blocks, Variable Declarations, Return statements), a user can query structural patterns that regex could never find:

MATCH (f:Function)-[:AST_CHILD*]->(t:TryCatchBlock)
MATCH (t)-[:AST_CHILD*]->(c:CatchClause)
WHERE NOT (c)-[:AST_CHILD*]->(:CallExpression {name: "logger.error"})
RETURN f.name

"Show me all functions that have a try/catch block that silently swallows errors without logging them."

This completely changes how a user steers large-scale refactors. Instead of guessing where technical debt lives, they query the exact code structure directly from the database to give the Worker Agents a precise hit-list of functions to fix.

Both routing dimensions (CALLS) and structural dimensions (AST_CHILD) work seamlessly together because all edges are stored in the same unified graph. The user does not need to "switch between views." They simply change the query.


9. The Worker-Judge Loop

The Worker-Judge Loop is the core quality-assurance mechanism of Hive. No code is committed without passing the Judge.

flowchart LR
    H["Hive<br/>Initiate"] -->|TASK| W["Worker<br/>Execute"]
    W -->|DRAFT| J["Judge<br/>Validate"]
    J -->|PASS| SHIP["Ship Code"]
    J -->|FAIL| W
    J -->|BUDGET HIT| ESC["Escalate to Human"]

    style H fill:#1a1a1a,stroke:#fff,color:#fff
    style W fill:#1a1a1a,stroke:#00bcd4,color:#00bcd4
    style J fill:#1a1a1a,stroke:#e040fb,color:#e040fb
    style SHIP fill:#1a1a1a,stroke:#4caf50,color:#4caf50
    style ESC fill:#1a1a1a,stroke:#ffc107,color:#ffc107

Step 1: Hive Initiates

Hive receives a task ("Fix the broken login endpoint"). It queries the Demiourgos graph to gather:

  • Which functions are involved in the login endpoint (via Route → HANDLED_BY → Function → CALLS chain)
  • The current impact scores on those functions
  • A context slice of only the relevant code

Hive packages this into a structured prompt and dispatches it to the Worker.

Step 2: Worker Executes

The Worker is the Brain (LLM) combined with the Hands (tools). It:

  1. Reads the context slice from Demiourgos
  2. Generates the code fix
  3. Writes the file via the Hands
  4. Triggers a Demiourgos re-scan (automatic on file save)
  5. Collects the new impact scores

The Worker produces a draft — the code change plus the updated graph state.

Step 3: Judge Validates

The Judge is a separate LLM call (can be the same model or a different one) that evaluates the Worker's output against a checklist:

Check What the Judge Verifies
Structural integrity Did the change introduce any new Hard Impact (1.0) edges? If yes, are they resolved?
Test passage Did existing tests pass? Were new tests added for new functionality?
Graph constraints Are there orphaned nodes? Did any Process break at an early step?
Policy compliance Does the code follow the team's coding style? (See Section 10)
Scope containment Did the change stay within the requested scope, or did the Worker edit unrelated files?

Humans Drive Scope, Agents Drive Code (Layer 4)

In Demiourgos, human developers no longer write code manually. The user exclusively manages Layer 1 (Business Stories) and Layer 2 (Features). The user talks to the Architect Agent to define the plan and constraints (Layer 3). Working from that plan, the Worker Agents are the only entities that modify Layer 4 (The Code Structure). The human developer acts as an executive reviewer, approving the final Pull Requests or Staging deployments.

The Three Outcomes

Pass → Ship: All checks pass. The code is committed and the PR is ready for human review.

Fail → Iterative Vectoring: The Judge does not just say "try again." It provides exact structural feedback designed to vector the Worker agent closer to the goal. The Worker receives:

  • The specific failing tests or graph constraints.
  • The precise AST lines that caused the new failure.
  • A dynamically updated context slice that now includes the downstream functions the Worker accidentally broke in its draft.

By feeding the Worker the exact blast radius of its own mistakes, Hive mathematically coerces the LLM toward a successful solution. The loop repeats until the graph is stable (max 3 retries).

Budget Hit → Escalate: If the Worker fails after N retries, or the token budget is exhausted, Hive escalates to a human developer. It provides:

  • The original task
  • Everything the Worker tried
  • The specific checks that keep failing
  • The relevant context slice

The human can fix the issue manually or adjust the task decomposition.


10. Coding Style Preservation

One of the hardest problems with AI-generated code is that it does not match the team's existing coding style. The AI writes valid code, but it looks "foreign" in the codebase.

Demiourgos solves this through stored style preferences in Semantic Memory.

How Style Is Captured

  1. Automatic style detection: During the first full scan, Demiourgos analyzes the codebase for patterns:

    • Naming conventions (snake_case, camelCase, PascalCase)
    • Import ordering (stdlib first, then third-party, then local)
    • Quote style (single vs double)
    • Docstring format (Google-style, NumPy-style, reStructuredText)
    • Error handling patterns (try/except vs early return)
    • Indentation (spaces vs tabs, 2 vs 4)
  2. Manual style overrides: Teams can create style files that the agent reads:

    • Claude CLAUDE.md / skills files — Stored instructions that Claude Code reads on every session. Teams can define coding rules, naming conventions, and architecture preferences here.
    • Cursor .cursorrules — Similar style rules for Cursor agents.
    • .editorconfig — Standard editor configuration for indentation, line endings, etc.
  3. Semantic Memory storage: All detected and manual style preferences are stored in the Semantic Memory layer of the Alloy Net. When the Brain generates code, it receives these preferences as part of its system prompt.

How Style Is Enforced

Stage What Happens
Brain prompt Hive injects style rules into the Brain's system prompt: "Use snake_case for Python functions, type hints on all parameters"
Worker output The Worker generates code following the injected style rules
Judge validation The Judge checks the output against the style rules. If naming convention is violated, it fails the check.
Post-commit hook If a linter is configured (ruff, eslint), the Hands run it automatically. Lint failures are fed back to the Worker.

How Style Keeps Improving

Every time a human overrides the AI's code style (renames a variable, reformats an import), on the next scan Demiourgos detects the change and updates the style profile in Semantic Memory. Over time, the style profile converges to the team's actual preferences.

Example: If the AI uses getUserById but the human always renames it to get_user_by_id, after three occurrences, the Semantic Memory records: "This team uses snake_case for function names." Future generations use snake_case.


11. Install and Run

Requirements

  • Python 3.11+ (for pipx / editable dev install)
  • A running FalkorDB instance (Redis with the FalkorDB module)

Install Channels

# Homebrew (recommended global install)
brew install demiourgos/tap/demiourgos
# Curl installer (binary install fallback)
curl -fsSL https://raw.githubusercontent.com/sarveshdakhore/demiourgos/main/demiourgos-backend/scripts/install.sh | sh
# pipx (Python-based global install)
pipx install demiourgos
# Local editable development install
cd demiourgos-backend
pip install -e .

Update Checks

Every command run performs a cached update check (once per 24 hours by default) and prints a non-blocking banner when a newer version exists. The banner includes the correct upgrade command based on install channel (brew, pipx, pip, or curl).

# Inspect current/latest version and channel-specific upgrade command
demiourgos self update-status

# Force live check (ignore cache)
demiourgos self update-status --check-now
# Disable automatic update checks
export DEMIOURGOS_NO_UPDATE_CHECK=1

# Optional: point update checks to a custom release endpoint
export DEMIOURGOS_UPDATE_MANIFEST_URL=https://api.github.com/repos/sarveshdakhore/demiourgos/releases/latest

Quickstart

# Start hosted control-plane backend (auth/projects/keys/trials)
demiourgos control-plane --host 0.0.0.0 --port 8000 --workspace-root .

# Initialize a new project
demiourgos init --path /path/to/your/codebase

# Scan the codebase and build the graph
demiourgos scan --config .demiourgos.json

# Start local user graph serve (data-plane, not hosted backend)
demiourgos serve --config .demiourgos.json --port 7788

# Start MCP server for AI agent tool calls (stdio)
demiourgos mcp --config .demiourgos.json

# Optional: enable MCP error reporting to hosted backend
export DEMIOURGOS_CONTROL_PLANE_URL=http://127.0.0.1:8000
export DEMIOURGOS_PROJECT_ID=<project_id>
export DEMIOURGOS_PROJECT_API_KEY=<dpk_key>

# Report circular dependency cycles
demiourgos report cycles --config .demiourgos.json

# Start the file watcher (auto-rescan on save)
demiourgos watch --config .demiourgos.json

Docker Runtime Sanity

Run the backend stack with:

docker compose up -d --build

Avoid docker compose run app ... for long-running backend services. It creates an extra one-off app container and can make it look like two backend apps are running.

Quick duplicate check:

docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | grep demiourgos

Docker (FalkorDB)

docker run -d -p 6379:6379 falkordb/falkordb:latest

12. Configuration

All configuration is in demiourgos_config.yaml:

# Which directories to scan
include_directories:
  - "**/*"           # Default: scan everything

exclude_directories:
  - "node_modules"
  - ".venv"
  - "__pycache__"
  - ".git"

# FalkorDB connection
redis_host: "localhost"
redis_port: 6379

# Graph identity
graph_name: "my_project"

Monorepo Support

For monorepos, use include_directories to scope the scan to specific services:

include_directories:
  - "services/auth/**/*"
  - "services/payment/**/*"
  - "shared/models/**/*"

exclude_directories:
  - "services/legacy/**/*"

License

This project is under active development.


Built with Tree-sitter, FalkorDB, Python, and a deep conviction that AI agents deserve structural truth, not token-wasting guesswork.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

demiourgos-0.1.0-py3-none-any.whl (310.4 kB view details)

Uploaded Python 3

File details

Details for the file demiourgos-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: demiourgos-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 310.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for demiourgos-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f288ca05aeb70d562bfc17942ff12998a08fd951d5b26c40493d3db55cd2643
MD5 6f6b3377210707f31f3c2e5dcd981cbb
BLAKE2b-256 d8ddd38c8e1c13443990f3171ccd1bb1db72421c7ce0b0d1d090e4b079a5d9b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page