Skip to main content

Proof-aware codebase intelligence - smarter context, fewer tokens

Project description

Provenant

The codebase intelligence layer for your AI coding agent.

Wiki indexing  ·  BM25 + HyDE retrieval  ·  Self-healing index  ·  Dead code  ·  Risk  ·  Git archaeology

PyPI License Python MCP Stars

Read the whitepaper →


🏆 Performance

Evaluated on SWE-bench Verified — 500 real GitHub issues across 12 Python repositories.

Metric Baseline + Provenant Δ
File Coverage@5 ~40% 63.8% +24 pp
File Coverage@10 with HyDE 75.2%
Tokens vs naive file reading baseline −60–65×
Answer quality (LLM judge) baseline parity −0.15 (noise)
Low-confidence queries healed 75% after one repair cycle
Cost per repair cycle ~$0.02

🔧 MCP Tools

Eight tools, usable from Claude Code, Cursor, Windsurf, Cline, and Copilot:

Tool What it does
provenant_ask Hybrid BM25 + HyDE retrieval → cited answer with confidence score
provenant_context Triage cards for files, modules, symbols — purpose, API, freshness
provenant_search Semantic search over wiki content
provenant_overview Architecture summary, entry points, dependency structure
provenant_symbol Source bytes for a specific function or class
provenant_dead_code Unreachable code with confidence tiers and safe-to-delete flags
provenant_risk Hotspot scores, change frequency, test coverage gaps, blast radius
provenant_why Architectural decisions and git archaeology — why does this code exist?
provenant serve ./myrepo
{
  "mcpServers": {
    "provenant": {
      "command": "provenant",
      "args": ["serve", "/path/to/repo"]
    }
  }
}

⚡ Quickstart

pip install provenant

provenant init ./myrepo        # index repo, generate wiki
provenant serve ./myrepo       # MCP server + web dashboard

provenant ask "how does auth work?" --repo ./myrepo
provenant costs ./myrepo

🧠 What Provenant Builds

Provenant runs once, builds everything, then keeps it in sync.

◆ Documentation Intelligence

provenant init parses your repo with tree-sitter across 15+ languages, builds a symbol + import graph, and generates plain-English wiki pages for every file — purpose, public API, key functions, relationships. Stored locally in .provenant/. Nothing leaves your machine except the LLM calls used to generate summaries.

When an agent asks a question, Provenant retrieves wiki pages instead of raw source. Prose matches natural-language queries the way code cannot.

◆ Attribution Confidence & Self-Healing

Every response computes confidence = cited pages / retrieved pages. Low-confidence answers automatically trigger background wiki repair — non-blocking, no command needed. On a 1,393-page Django index, rewriting just 10 pages fixed 75% of low-confidence queries at ~$0.02 total.

◆ Graph Intelligence

tree-sitter parses every file into a two-tier dependency graph — file nodes and symbol nodes (functions, classes, methods). Heritage extraction covers extends, implements, mixins, and trait impls across 15 languages. PageRank + betweenness centrality identify your most central and most coupled code.

◆ Dead Code Analysis

Identifies unreachable functions, classes, and modules. Groups by confidence tier (definite / likely / possible). Flags safe-to-delete vs. dynamically-called code. Works across Python, TypeScript, Go, Rust, and more.

◆ Risk Scoring

Change frequency × dependency centrality × test coverage gaps → per-file risk score. Know what breaks before you touch it.

◆ Git Archaeology

provenant_why traces why code exists: git blame, commit history, and architectural decisions linked to the files your agent is editing.


🗂️ Monorepo / Workspace Support

provenant init ./my-project
# Detected 3 repositories:
#   backend/     (Django)
#   frontend/    (React/TypeScript)
#   mobile/      (React Native)

Each sub-repo gets its own wiki. Cross-repo context is linked automatically — questions about the frontend surface relevant backend files and vice versa.


🖥️ Web Dashboard

provenant serve ./myrepo   # MCP server + local web UI

Visualize the knowledge graph, wiki pages, dead code report, risk scores, and repair candidates in a local browser dashboard. No external services required.


⚙️ Configuration

# LLM provider (pick one)
ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...
DEEPSEEK_API_KEY=...

# Embedder — optional, enables vector search + HyDE
OPENAI_EMBEDDING_API_KEY=...
OPENAI_EMBEDDING_MODEL=nomic-embed-text-v1.5
OPENAI_EMBEDDING_BASE_URL=https://api.fireworks.ai/inference/v1

# Embedding tiers
provenant init ./myrepo --embedder local     # free, ~40 MB, no API key
provenant init ./myrepo --embedder openai    # 768-dim, best retrieval

Self-hostable. Zero telemetry. Bring your own keys — works with Anthropic, OpenAI, DeepSeek, Gemini, OpenRouter, or local Ollama.


📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

provenant-0.1.4.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

provenant-0.1.4-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file provenant-0.1.4.tar.gz.

File metadata

  • Download URL: provenant-0.1.4.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.4.tar.gz
Algorithm Hash digest
SHA256 df12f4977f12df951f24ac8778e92efc4c7590429ca0dd04cf1ee5ab3fcefafe
MD5 3fb03d0ee4b057795bca52620bc464c9
BLAKE2b-256 6830d8a604b8c5a69c5ace65fe702c65a30a6e16c5dea64f9279272d5adef032

See more details on using hashes here.

File details

Details for the file provenant-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: provenant-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 32ef4e70b56bb6f7f9a277a07e26a38f755208361efe665afe2a45aaf645864f
MD5 f6e22b5da96f9ecc2a30fcbdcfcce186
BLAKE2b-256 2f756127c85472fa6b88ce7cd6100febc71b029763ca834daa2f993ce89f0493

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page