Skip to main content

Proof-aware codebase intelligence - smarter context, fewer tokens

Project description

Provenant

The codebase intelligence layer for your AI coding agent.

Wiki indexing  ·  BM25 + HyDE retrieval  ·  Self-healing index  ·  Dead code  ·  Risk  ·  Git archaeology

PyPI License Python MCP Stars

Read the whitepaper →


🏆 Performance

Evaluated on SWE-bench Verified — 500 real GitHub issues across 12 Python repositories.

Metric Baseline + Provenant Δ
File Coverage@5 ~40% 63.8% +24 pp
File Coverage@10 with HyDE 75.2%
Tokens vs naive file reading baseline −60–65×
Answer quality (LLM judge) baseline parity −0.15 (noise)
Low-confidence queries healed 75% after one repair cycle
Cost per repair cycle ~$0.02

🔧 MCP Tools

Eight tools, usable from Claude Code, Cursor, Windsurf, Cline, and Copilot:

Tool What it does
provenant_ask Hybrid BM25 + HyDE retrieval → cited answer with confidence score
provenant_context Triage cards for files, modules, symbols — purpose, API, freshness
provenant_search Semantic search over wiki content
provenant_overview Architecture summary, entry points, dependency structure
provenant_symbol Source bytes for a specific function or class
provenant_dead_code Unreachable code with confidence tiers and safe-to-delete flags
provenant_risk Hotspot scores, change frequency, test coverage gaps, blast radius
provenant_why Architectural decisions and git archaeology — why does this code exist?
provenant serve ./myrepo
{
  "mcpServers": {
    "provenant": {
      "command": "provenant",
      "args": ["serve", "/path/to/repo"]
    }
  }
}

⚡ Quickstart

pip install provenant

provenant init ./myrepo        # index repo, generate wiki
provenant serve ./myrepo       # MCP server + web dashboard

provenant ask "how does auth work?" --repo ./myrepo
provenant costs ./myrepo

🧠 What Provenant Builds

Provenant runs once, builds everything, then keeps it in sync.

◆ Documentation Intelligence

provenant init parses your repo with tree-sitter across 15+ languages, builds a symbol + import graph, and generates plain-English wiki pages for every file — purpose, public API, key functions, relationships. Stored locally in .provenant/. Nothing leaves your machine except the LLM calls used to generate summaries.

When an agent asks a question, Provenant retrieves wiki pages instead of raw source. Prose matches natural-language queries the way code cannot.

◆ Attribution Confidence & Self-Healing

Every response computes confidence = cited pages / retrieved pages. Low-confidence answers automatically trigger background wiki repair — non-blocking, no command needed. On a 1,393-page Django index, rewriting just 10 pages fixed 75% of low-confidence queries at ~$0.02 total.

◆ Graph Intelligence

tree-sitter parses every file into a two-tier dependency graph — file nodes and symbol nodes (functions, classes, methods). Heritage extraction covers extends, implements, mixins, and trait impls across 15 languages. PageRank + betweenness centrality identify your most central and most coupled code.

◆ Dead Code Analysis

Identifies unreachable functions, classes, and modules. Groups by confidence tier (definite / likely / possible). Flags safe-to-delete vs. dynamically-called code. Works across Python, TypeScript, Go, Rust, and more.

◆ Risk Scoring

Change frequency × dependency centrality × test coverage gaps → per-file risk score. Know what breaks before you touch it.

◆ Git Archaeology

provenant_why traces why code exists: git blame, commit history, and architectural decisions linked to the files your agent is editing.


🗂️ Monorepo / Workspace Support

provenant init ./my-project
# Detected 3 repositories:
#   backend/     (Django)
#   frontend/    (React/TypeScript)
#   mobile/      (React Native)

Each sub-repo gets its own wiki. Cross-repo context is linked automatically — questions about the frontend surface relevant backend files and vice versa.


🖥️ Web Dashboard

provenant serve ./myrepo   # MCP server + local web UI

Visualize the knowledge graph, wiki pages, dead code report, risk scores, and repair candidates in a local browser dashboard. No external services required.


⚙️ Configuration

# LLM provider (pick one)
ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...
DEEPSEEK_API_KEY=...

# Embedder — optional, enables vector search + HyDE
OPENAI_EMBEDDING_API_KEY=...
OPENAI_EMBEDDING_MODEL=nomic-embed-text-v1.5
OPENAI_EMBEDDING_BASE_URL=https://api.fireworks.ai/inference/v1

# Embedding tiers
provenant init ./myrepo --embedder local     # free, ~40 MB, no API key
provenant init ./myrepo --embedder openai    # 768-dim, best retrieval

Self-hostable. Zero telemetry. Bring your own keys — works with Anthropic, OpenAI, DeepSeek, Gemini, OpenRouter, or local Ollama.


📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

provenant-0.1.3.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

provenant-0.1.3-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file provenant-0.1.3.tar.gz.

File metadata

  • Download URL: provenant-0.1.3.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7908c1cd271df641d1ee87750e1ba68be6c8ac40bcf8b202ced8f32c3251b982
MD5 125d1a5397620f6bfe496fe1c7b7b287
BLAKE2b-256 3fa90041bc7237e6e95e013345f925d0566bb604021da5cc48e887e1cea5a0d9

See more details on using hashes here.

File details

Details for the file provenant-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: provenant-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for provenant-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9678350c26f35d0c43650c74e0b5c88fdf054765b23d2619fc6e88a74d1e9abd
MD5 2802db7c8dc4db9c6f259945680d95e4
BLAKE2b-256 9fa6d430c45b5e14f12eb93a2f978c5a0eda35334eb7ed771c9fcb711942ee5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page