Skip to main content

Proof-aware codebase intelligence - smarter context, fewer tokens

Project description

Provenant

The codebase intelligence layer for your AI coding agent.

Wiki indexing  ·  BM25 + HyDE retrieval  ·  Self-healing index  ·  Dead code  ·  Risk  ·  Git archaeology

PyPI License Python MCP Stars

Read the whitepaper →


🏆 Performance

Evaluated on SWE-bench Verified — 500 real GitHub issues across 12 Python repositories.

Metric Baseline + Provenant Δ
File Coverage@5 ~40% 63.8% +24 pp
File Coverage@10 with HyDE 75.2%
Tokens vs naive file reading baseline −60–65×
Answer quality (LLM judge) baseline parity −0.15 (noise)
Low-confidence queries healed 75% after one repair cycle
Cost per repair cycle ~$0.02

🔧 MCP Tools

Eight tools, usable from Claude Code, Cursor, Windsurf, Cline, and Copilot:

Tool What it does
provenant_ask Hybrid BM25 + HyDE retrieval → cited answer with confidence score
provenant_context Triage cards for files, modules, symbols — purpose, API, freshness
provenant_search Semantic search over wiki content
provenant_overview Architecture summary, entry points, dependency structure
provenant_symbol Source bytes for a specific function or class
provenant_dead_code Unreachable code with confidence tiers and safe-to-delete flags
provenant_risk Hotspot scores, change frequency, test coverage gaps, blast radius
provenant_why Architectural decisions and git archaeology — why does this code exist?
provenant serve ./myrepo
{
  "mcpServers": {
    "provenant": {
      "command": "provenant",
      "args": ["serve", "/path/to/repo"]
    }
  }
}

⚡ Quickstart

pip install provenant

provenant init ./myrepo        # index repo, generate wiki
provenant serve ./myrepo       # MCP server + web dashboard

provenant ask "how does auth work?" --repo ./myrepo
provenant costs ./myrepo

🧠 What Provenant Builds

Provenant runs once, builds everything, then keeps it in sync.

◆ Documentation Intelligence

provenant init parses your repo with tree-sitter across 15+ languages, builds a symbol + import graph, and generates plain-English wiki pages for every file — purpose, public API, key functions, relationships. Stored locally in .provenant/. Nothing leaves your machine except the LLM calls used to generate summaries.

When an agent asks a question, Provenant retrieves wiki pages instead of raw source. Prose matches natural-language queries the way code cannot.

◆ Attribution Confidence & Self-Healing

Every response computes confidence = cited pages / retrieved pages. Low-confidence answers automatically trigger background wiki repair — non-blocking, no command needed. On a 1,393-page Django index, rewriting just 10 pages fixed 75% of low-confidence queries at ~$0.02 total.

◆ Graph Intelligence

tree-sitter parses every file into a two-tier dependency graph — file nodes and symbol nodes (functions, classes, methods). Heritage extraction covers extends, implements, mixins, and trait impls across 15 languages. PageRank + betweenness centrality identify your most central and most coupled code.

◆ Dead Code Analysis

Identifies unreachable functions, classes, and modules. Groups by confidence tier (definite / likely / possible). Flags safe-to-delete vs. dynamically-called code. Works across Python, TypeScript, Go, Rust, and more.

◆ Risk Scoring

Change frequency × dependency centrality × test coverage gaps → per-file risk score. Know what breaks before you touch it.

◆ Git Archaeology

provenant_why traces why code exists: git blame, commit history, and architectural decisions linked to the files your agent is editing.


🗂️ Monorepo / Workspace Support

provenant init ./my-project
# Detected 3 repositories:
#   backend/     (Django)
#   frontend/    (React/TypeScript)
#   mobile/      (React Native)

Each sub-repo gets its own wiki. Cross-repo context is linked automatically — questions about the frontend surface relevant backend files and vice versa.


🖥️ Web Dashboard

provenant serve ./myrepo   # MCP server + local web UI

Visualize the knowledge graph, wiki pages, dead code report, risk scores, and repair candidates in a local browser dashboard. No external services required.


⚙️ Configuration

# LLM provider (pick one)
ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...
DEEPSEEK_API_KEY=...

# Embedder — optional, enables vector search + HyDE
OPENAI_EMBEDDING_API_KEY=...
OPENAI_EMBEDDING_MODEL=nomic-embed-text-v1.5
OPENAI_EMBEDDING_BASE_URL=https://api.fireworks.ai/inference/v1

# Embedding tiers
provenant init ./myrepo --embedder local     # free, ~40 MB, no API key
provenant init ./myrepo --embedder openai    # 768-dim, best retrieval

Self-hostable. Zero telemetry. Bring your own keys — works with Anthropic, OpenAI, DeepSeek, Gemini, OpenRouter, or local Ollama.


📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

provenant-0.1.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

provenant-0.1.2-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file provenant-0.1.2.tar.gz.

File metadata

  • Download URL: provenant-0.1.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.2.tar.gz
Algorithm Hash digest
SHA256 002833bca53da56a0290edd41d1dcf2029a39bac55af8c79e54e7f1a6a016e9f
MD5 44e696a0ee3c3239a8ffa9a35012e48a
BLAKE2b-256 4a98b7c2a4611cc9d8fb7dc299606fff283c92fe83ad6a1b28bce65e6581e00c

See more details on using hashes here.

File details

Details for the file provenant-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: provenant-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e07cc724ae06a1471f92e81fd399da9adc732efec4ba33a4ce106ae36a8a0f3c
MD5 c58bce829c7b54485804141797b6b11c
BLAKE2b-256 b5e4dd189abd58e138c6c9329c4d641a35ecdffda906522dde8706f7477073eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page