Skip to main content

AI-powered project mapping for debugging, tracing, and execution analysis

Project description

Context Engine

AI-powered project mapping that provides full details and intents for your local llms. Reads your codebase, understands what every file actually does, and writes the truth back into your source — so both humans and machines stop guessing.


Overview

Context Engine is a local, offline-first code intelligence layer that walks a Python project, parses every file with AST, and uses a local GGUF model (Gemma by default) to generate a one-sentence "intent" for every file, function, class, and method. The result is a single project_summary.json that becomes the nervous system that DebugFlow's surgeon and logger pull from when something breaks.

Built for two audiences:

  • Developers who want their codebase to self-document and want a machine-readable map of intent + dependencies for every module.
  • DebugFlow / ML pipelines that need a global "what is this project" context to make crash diagnosis and auto-repair surgical instead of guesswork.

Everything runs locally. No code ever leaves your machine.


How it works — process model

Context Engine runs as a completely separate process from the project it analyzes. It never imports, executes, or links against any of your project's code. It only:

  1. Walks your directory tree with os.walk.
  2. Reads each .py file as plain text.
  3. Parses it with Python's built-in ast module (static analysis only).
  4. Passes code snippets to a local GGUF model for summarization.
  5. Writes JSON output to <project>/context/.

This means you can safely run it against any project — broken, partially installed, or with conflicting dependencies — without any interference in either direction.


Quick Demo

What it looks like in flight

$ context-engine .

═══════════════════════════════════════════════════════
📂 TARGET:      /home/you/projects/my_app
🧠 AI LOGS:     ENABLED
═══════════════════════════════════════════════════════

🔍 Scanning files in: /home/you/projects/my_app
📂 Found 24 Python nodes.
🧠 Cache Loaded: 18 file hashes recognized.
🧬 Synthesizing context: core.py
🧠 LLM (file): Orchestrates the request lifecycle and dispatches to handlers.
✍️ Injected AI Intent: core.py
🧠 Analyzing Project: 100%|████████████| 24/24 [00:42<00:00,  1.75s/file]
💾 State Physically Synchronized: 24 keys.
🏁 Neural Mapping Complete (42.18s).

───────────────────────────────────────────────────────
🏁 SCAN COMPLETE
⏱️  Duration:     42.18s
📄 Total Files:   24 Cached:        18
🧠 AI Analyzed:   6
───────────────────────────────────────────────────────

After the run, you get a context/project_summary.json at the project root with the full neural map of your code — files, intents, dependencies, classes, methods, the works.


Installation

pip install context_engine

Dependencies (auto-installed):

  • tqdm — progress bar for the scan loop
  • llama-cpp-python — runs the local GGUF model
  • debugflow — sibling package; provides the logger and SpineLink telemetry

Requires Python 3.10+.

You also need a GGUF model on disk. The engine is tuned around google_gemma-3-4b-it-Q5_K_M.gguf, but any chat-tuned GGUF that llama-cpp-python can load will work.

After install, link your model once:

context-engine model-path
# 🎯 Enter absolute path to your GGUF model: /home/you/models/gemma-3-4b-it-Q5_K_M.gguf
# ✨ Configuration saved successfully.

The path is persisted to ~/.context_engine/config.json and reused across every project.


Usage

Option 1 — CLI (recommended)

From the root of any Python project:

context-engine .

The engine will:

  1. Walk every .py file (skipping __pycache__, .git, venv, models, context).
  2. Hash each file and skip anything already cached.
  3. Send only the changed files to the local LLM for re-summarization.
  4. Inject a """File summary: ...""" docstring at the top of any file that doesn't already have one.
  5. Write the full project map to ./context/project_summary.json.

Option 2 — Python module (embed in your own tooling)

from context_engine import ContextEngine

engine = ContextEngine(project_path=".", logs_on=True, context_logs_on=True)
project_map, stats = engine.run(auto_inject=True)

print(f"Mapped {stats['total_files']} files in {stats['time_taken']:.2f}s")
print(f"Cache hits: {stats['cache_hits']}, AI analyses: {stats['new_analyses']}")

project_map is the same dict written to disk — use it directly without round-tripping through JSON.

The engine instance is just a regular Python object. Creating it inside your own script does not affect your process's imports or environment in any way — it only touches the filesystem paths you give it.

Stopping it

The scan is cooperative. Ctrl+C at any time; the cache is fsync'd after each file so the next run picks up exactly where you left off.


Problem + Motivation

Every non-trivial codebase suffers from the same rot:

  • Docstrings drift, lie, or never get written.
  • File names imply one thing while the code does another.
  • When something crashes deep inside an ML pipeline, the only "context" your debugger has is the traceback — no idea what the surrounding files were supposed to do.

Context Engine fixes this at the root by treating the project itself as the source of truth. Instead of trusting names or stale docstrings, it reads the actual logic of every function and asks a local LLM to summarize what it executes, not what it claims to do. That summary then becomes:

  1. A real, injected docstring at the top of the file.
  2. A node in the global project_summary.json map.
  3. The "neighborhood context" that DebugFlow's surgeon uses when proposing a fix for a crashing file.

The whole pipeline is local, cached, and incremental — so re-running it across a 500-file repo is cheap.


Key Features

  • Local-first. Runs entirely offline through llama-cpp-python. No API keys, no telemetry, no code leaves the machine.
  • Process-isolated. Only reads files; never imports or executes your project's code.
  • Skeptic prompting. The LLM is instructed to ignore misleading names and summarize what the code actually executes.
  • Hash-based incremental cache. SHA-256 per file; only changed files get re-analyzed.
  • AST-level extraction. Functions, classes, methods, signatures, and a 50-line logic preview per symbol — not just names.
  • Auto-injection. Files without a module docstring get a real one written in, derived from the model's intent.
  • Dependency graph. Every file's imports are mapped into a global graph, exposing the project's nervous system.
  • DebugFlow integration. Logs route through debugflow.logger_system; the resulting map is consumed by the DebugFlow surgeon for crash repair.
  • Toggleable AI chatter. Mute the LLM's status logs without touching the rest of DebugFlow's logging.
  • Crash-safe persistence. State is fsync'd to disk after each scan; partial runs survive interruption.

API Usage / Examples

Mapping a single project

from context_engine import ContextEngine

engine = ContextEngine("/path/to/project")
project_map, stats = engine.run()

Ignoring framework / boilerplate directories

engine = ContextEngine(
    project_path=".",
    ignore_list=["migrations", "tests", "conftest.py"]
)
project_map, stats = engine.run()

Entries in ignore_list are matched against both directory names and file names.

Reading a previously generated map

import json
from pathlib import Path

summary = json.loads(Path("context/project_summary.json").read_text())

print("Project:", summary["project_name"])
for entry in summary["map"]:
    print(f"  {entry['file']}: {entry['intent']}")

Inspecting the dependency graph

import json
from pathlib import Path

summary = json.loads(Path("context/project_summary.json").read_text())

for file, deps in summary["dependencies"].items():
    print(f"{file}")
    for d in deps:
        print(f"   └─ {d}")

Running silently (no AI logs)

engine = ContextEngine(
    project_path=".",
    logs_on=True,           # keep DebugFlow's master pipe alive
    context_logs_on=False,  # silence the engine's own chatter
)
engine.run()

Mapping without auto-injecting docstrings

If you want a read-only pass (no file mutations), disable injection:

engine = ContextEngine(".")
project_map, stats = engine.run(auto_inject=False)

Switching the model at runtime

from context_engine import set_model_path, get_model_path

set_model_path("/new/path/to/another-model.gguf")
print("Active model:", get_model_path())

Configuration via environment variables

Variable Purpose Default
MODEL_PATH Override the GGUF model path (takes precedence over the saved config). Falls back to ~/.context_engine/config.json, then to ./models/google_gemma-3-4b-it-Q5_K_M.gguf

Persistent config is stored in:

  • ~/.context_engine/config.json — global model path.
  • <project>/context/cache.json — per-project hash cache.
  • <project>/context/project_summary.json — per-project neural map.
  • <project>/.context/engine_flow.log — runtime log piped through DebugFlow.

The engine's chatter toggle is persisted at:

  • <install>/context_engine/.context_log_stateON / OFF.

Console scripts

Command What it does
context-engine <path> Run a full scan over the given project path (use . for cwd).
context-engine model-path Interactive prompt to link / re-link your GGUF model.
context-logs Toggle the engine's AI chatter ON ↔ OFF (state persists).
context-logs-on Force AI chatter ON.
context-logs-off Force AI chatter OFF (silenced).

You can also override the chatter state inline for a single run:

context-engine . context-logs off

Project map schema

The context/project_summary.json written after each scan has this shape:

{
  "project_name": "my_app",
  "tree": {
    "root": "my_app",
    "structure": [
      { "folder": "", "files": ["main.py", "utils.py"] },
      { "folder": "models", "files": ["data.py"] }
    ]
  },
  "dependencies": {
    "main.py": ["os", "utils", "models.data"],
    "utils.py": ["re", "pathlib"]
  },
  "map": [
    {
      "file": "main.py",
      "intent": "Orchestrates the request lifecycle and dispatches to handlers.",
      "index": {
        "run(args: list) -> None": "Parses CLI args and delegates to the appropriate handler."
      },
      "classes": {
        "App": {
          "intent": "Holds application state and routes incoming requests.",
          "methods": {
            "start(self) -> None": "Initialises the event loop and binds the socket."
          }
        }
      },
      "dependencies": ["os", "utils"],
      "docstring": "-------- main --------\nOrchestrates the request lifecycle.\n-------- main --------"
    }
  ]
}

Project Status

Stable:

  • AST parsing of functions, classes, methods (signature + docstring + 50-line logic preview).
  • Local LLM summarization via llama-cpp-python with the skeptic prompt.
  • Hash-based incremental cache and crash-safe fsync persistence.
  • Auto-injection of file-level docstrings.
  • Dependency graph extraction.
  • CLI (context-engine, model-path) and persistent log-state toggles.
  • DebugFlow logger integration (debugflow.logger_system, child-logger naming).
  • ignore_list support for filtering framework noise.

In progress / experimental:

  • surgeon.operate() — pulls the latest crash from DebugFlow's SpineLink, locates the offending file in the project map, and asks the LLM to propose a patch. Functional end-to-end but treated as experimental until the patching step is hardened.
  • Richer class-level intent (currently uses the class docstring as the prompt; logic-preview-based class intent is on the bench).

License

MIT © 2026 ProfessionalMario. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

context_stream-1.0.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

context_stream-1.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file context_stream-1.0.0.tar.gz.

File metadata

  • Download URL: context_stream-1.0.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for context_stream-1.0.0.tar.gz
Algorithm Hash digest
SHA256 21c701a57813c4c5a24d39f4522b80bb476a405bf0bba5c8a17c705a577b36b8
MD5 182c79ac15659ade48afe3c383e5309c
BLAKE2b-256 58c563918f0da9583a4d3523511a0a22cfda3941bcb249f25f748f33462fe4a7

See more details on using hashes here.

File details

Details for the file context_stream-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: context_stream-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for context_stream-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e17436d80ab8af096208a91bf26262254be5ab8c32c4715ec9ae43479f842bf6
MD5 4f8a71364237a931ea1292b19c6ce235
BLAKE2b-256 5cba951731b79f2844c4b8abe7cd98b7ff1a486f3753b8229289363801f89a1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page