Skip to main content

Code archaeologist - reconstruct function decision history via AST-aware lineage tracking

Project description

Archeologist - Semantic Lineage Graph Generator

Post-incident archaeology tool that reconstructs function decision history via deterministic AST-aware lineage tracking.

CLI Usage

# Install
pip install -e .

# Analyze file (auto-detects git repo)
arc analyze path/to/file.py

# Analyze specific function  
arc analyze-function path/to/file.py function_name

# With GitHub PR integration
arc analyze-function path/to/file.py function_name --repo owner/repo

# With LLM narrative synthesis
export CLAUDE_API_KEY=sk-ant-xxx
arc analyze-function path/to/file.py function_name

Configuration

Set environment variables:

export GITHUB_TOKEN=ghp_xxx
export CLAUDE_API_KEY=sk-ant-xxx
export GIT_REPO_PATH=/path/to/local/repo

Architecture

Three-phase pipeline:

  1. Semantic Lineage Tracking

    • GitWalker traverses history (--no-renames flag)
    • ASTParser extracts function boundaries (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
    • LineageTracker links nodes via four-tier hierarchy
  2. Contextual Slicing

    • PRFetcher pulls associated PRs
    • Geographic filter maps review comments to AST node line ranges
  3. Narrative Synthesis

    • LiteLLM abstracts LLM calls (Claude, local models)
    • Outputs 5-sentence brief explaining decisions

MCP Server

The tool exposes an MCP-compatible JSON-RPC 2.0 server over stdio:

# Run MCP server
arc-mcp

# Or run directly
python -m src.mcp.server

Available Methods

// List functions in a file
{"jsonrpc": "2.0", "id": 1, "method": "list_functions", "params": {"file_path": "/path/to/file.py"}}

// Analyze a specific function
{"jsonrpc": "2.0", "id": 2, "method": "analyze_function", "params": {"file_path": "/path/to/file.py", "function_name": "foo"}}

// Analyze a file's overall lineage
{"jsonrpc": "2.0", "id": 3, "method": "analyze_file", "params": {"file_path": "/path/to/file.py"}}

Example

# Analyze the `authenticate` function in a FastAPI project
GIT_REPO_PATH=/Users/fuads/fastapi arc analyze-function app/auth.py authenticate --repo fastapi/fastapi

Output:

Analyzing function authenticate in app/auth.py...
Found 12 lineage edges for authenticate
Summary: Found 12 historical versions of this code. Change types: physical: 8, identity: 4

With LLM synthesis:

The authenticate function evolved through 12 commits over 18 months. 
Initial implementation used simple token validation, replaced in PR #2341 
with OAuth2 Bearer token parsing after security audit. Several performance 
optimizations were attempted (PRs #1892, #2103) but reverted due to race 
conditions. The current implementation handles both JWT and opaque tokens 
with a unified interface, consolidating three previous approaches.

Testing

pytest tests/

Roadmap

  • Graph construction (GitWalker, ASTParser, LineageTracker)
  • Contextual slicing (PRFetcher, GeographicFilter)
  • CLI commands with local git repo auto-detection
  • 10 language support (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
  • Real-world testing on Flask repo
  • MCP server (JSON-RPC 2.0 over stdio, works with Python 3.9)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archeologist-0.1.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archeologist-0.1.1-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file archeologist-0.1.1.tar.gz.

File metadata

  • Download URL: archeologist-0.1.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fd9198073186eae939c3af383752a296ed7fbde8754cb56b7982565df248cb83
MD5 693d19eab1e9f659f0f50dcabaf8423d
BLAKE2b-256 ee5bf67a1cfc9b0482024e72aaa83ac244cd8b4babd85792266b5059bd2058df

See more details on using hashes here.

File details

Details for the file archeologist-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: archeologist-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dc129ee9088adbc89bd479167f6472f3377dc0ff5a642941910a38236758f52e
MD5 0b5eb54d4105822d3d34f4293657a028
BLAKE2b-256 d18e88762a55950309469140e1b95a76217f88f78e83695eeb6031630b43745d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page