Skip to main content

Code archaeologist - reconstruct function decision history via AST-aware lineage tracking

Project description

Archeologist - Semantic Lineage Graph Generator

Post-incident archaeology tool that reconstructs function decision history via deterministic AST-aware lineage tracking.

CLI Usage

# Install
pip install -e .

# Analyze file (auto-detects git repo)
arc analyze path/to/file.py

# Analyze specific function  
arc analyze-function path/to/file.py function_name

# With GitHub PR integration
arc analyze-function path/to/file.py function_name --repo owner/repo

# With LLM narrative synthesis
export CLAUDE_API_KEY=sk-ant-xxx
arc analyze-function path/to/file.py function_name

Configuration

Set environment variables:

export GITHUB_TOKEN=ghp_xxx
export CLAUDE_API_KEY=sk-ant-xxx
export GIT_REPO_PATH=/path/to/local/repo

Architecture

Three-phase pipeline:

  1. Semantic Lineage Tracking

    • GitWalker traverses history (--no-renames flag)
    • ASTParser extracts function boundaries (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
    • LineageTracker links nodes via four-tier hierarchy
  2. Contextual Slicing

    • PRFetcher pulls associated PRs
    • Geographic filter maps review comments to AST node line ranges
  3. Narrative Synthesis

    • LiteLLM abstracts LLM calls (Claude, local models)
    • Outputs 5-sentence brief explaining decisions

MCP Server

The tool exposes an MCP-compatible JSON-RPC 2.0 server over stdio:

# Run MCP server
arc-mcp

# Or run directly
python -m src.mcp.server

Available Methods

// List functions in a file
{"jsonrpc": "2.0", "id": 1, "method": "list_functions", "params": {"file_path": "/path/to/file.py"}}

// Analyze a specific function
{"jsonrpc": "2.0", "id": 2, "method": "analyze_function", "params": {"file_path": "/path/to/file.py", "function_name": "foo"}}

// Analyze a file's overall lineage
{"jsonrpc": "2.0", "id": 3, "method": "analyze_file", "params": {"file_path": "/path/to/file.py"}}

Example

# Analyze the `authenticate` function in a FastAPI project
GIT_REPO_PATH=/Users/fuads/fastapi arc analyze-function app/auth.py authenticate --repo fastapi/fastapi

Output:

Analyzing function authenticate in app/auth.py...
Found 12 lineage edges for authenticate
Summary: Found 12 historical versions of this code. Change types: physical: 8, identity: 4

With LLM synthesis:

The authenticate function evolved through 12 commits over 18 months. 
Initial implementation used simple token validation, replaced in PR #2341 
with OAuth2 Bearer token parsing after security audit. Several performance 
optimizations were attempted (PRs #1892, #2103) but reverted due to race 
conditions. The current implementation handles both JWT and opaque tokens 
with a unified interface, consolidating three previous approaches.

Testing

pytest tests/

Roadmap

  • Graph construction (GitWalker, ASTParser, LineageTracker)
  • Contextual slicing (PRFetcher, GeographicFilter)
  • CLI commands with local git repo auto-detection
  • 10 language support (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
  • Real-world testing on Flask repo
  • MCP server (JSON-RPC 2.0 over stdio, works with Python 3.9)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archeologist-0.1.0.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archeologist-0.1.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file archeologist-0.1.0.tar.gz.

File metadata

  • Download URL: archeologist-0.1.0.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d81c6ebafb01cb66373bda836f5936f6c44b92eecd0590f3a91595670b7a325
MD5 479a4753def726a416594d4cf17dd0dc
BLAKE2b-256 e6d4adfa48a59e42895c725023716fd1f1996c613eca620e7f0c6cc909e8da5f

See more details on using hashes here.

File details

Details for the file archeologist-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: archeologist-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87441331d2b2b55d3c6096e568b52d23ca77b5e264d686f41b2c413f36dd5507
MD5 9dc878d9885e788e53e1dbee22e30c3a
BLAKE2b-256 764dd30ad56c78681fa54034dca1dbf1014d923b68b2a37baa67d8d6094e3cca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page