Skip to main content

Code archaeologist - reconstruct function decision history via AST-aware lineage tracking

Project description

Archeologist - Semantic Lineage Graph Generator

Post-incident archaeology tool that reconstructs function decision history via deterministic AST-aware lineage tracking.

CLI Usage

# Install
pip install -e .

# Analyze file (auto-detects git repo)
arc analyze path/to/file.py

# Analyze specific function  
arc analyze-function path/to/file.py function_name

# With GitHub PR integration
arc analyze-function path/to/file.py function_name --repo owner/repo

# With LLM narrative synthesis
export CLAUDE_API_KEY=sk-ant-xxx
arc analyze-function path/to/file.py function_name

Configuration

Set environment variables:

export GITHUB_TOKEN=ghp_xxx
export CLAUDE_API_KEY=sk-ant-xxx
export GIT_REPO_PATH=/path/to/local/repo

Architecture

Three-phase pipeline:

  1. Semantic Lineage Tracking

    • GitWalker traverses history (--no-renames flag)
    • ASTParser extracts function boundaries (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
    • LineageTracker links nodes via four-tier hierarchy
  2. Contextual Slicing

    • PRFetcher pulls associated PRs
    • Geographic filter maps review comments to AST node line ranges
  3. Narrative Synthesis

    • LiteLLM abstracts LLM calls (Claude, local models)
    • Outputs 5-sentence brief explaining decisions

MCP Server

The tool exposes an MCP-compatible JSON-RPC 2.0 server over stdio:

# Run MCP server
arc-mcp

# Or run directly
python -m src.mcp.server

Available Methods

// List functions in a file
{"jsonrpc": "2.0", "id": 1, "method": "list_functions", "params": {"file_path": "/path/to/file.py"}}

// Analyze a specific function
{"jsonrpc": "2.0", "id": 2, "method": "analyze_function", "params": {"file_path": "/path/to/file.py", "function_name": "foo"}}

// Analyze a file's overall lineage
{"jsonrpc": "2.0", "id": 3, "method": "analyze_file", "params": {"file_path": "/path/to/file.py"}}

Example

# Analyze the `authenticate` function in a FastAPI project
GIT_REPO_PATH=/Users/fuads/fastapi arc analyze-function app/auth.py authenticate --repo fastapi/fastapi

Output:

Analyzing function authenticate in app/auth.py...
Found 12 lineage edges for authenticate
Summary: Found 12 historical versions of this code. Change types: physical: 8, identity: 4

With LLM synthesis:

The authenticate function evolved through 12 commits over 18 months. 
Initial implementation used simple token validation, replaced in PR #2341 
with OAuth2 Bearer token parsing after security audit. Several performance 
optimizations were attempted (PRs #1892, #2103) but reverted due to race 
conditions. The current implementation handles both JWT and opaque tokens 
with a unified interface, consolidating three previous approaches.

Testing

pytest tests/

Roadmap

  • Graph construction (GitWalker, ASTParser, LineageTracker)
  • Contextual slicing (PRFetcher, GeographicFilter)
  • CLI commands with local git repo auto-detection
  • 10 language support (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
  • Real-world testing on Flask repo
  • MCP server (JSON-RPC 2.0 over stdio, works with Python 3.9)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archeologist-0.1.3.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archeologist-0.1.3-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file archeologist-0.1.3.tar.gz.

File metadata

  • Download URL: archeologist-0.1.3.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.3.tar.gz
Algorithm Hash digest
SHA256 06f354c23aa211f1222565410520a44139501010f30000eaa4cc1ab8d89386b1
MD5 7c5689dd4dd646440fb1e60b06bc820b
BLAKE2b-256 7960daac17591d2be1b0aa5493f94fa71817b658480e58737ee0d868681f54fa

See more details on using hashes here.

File details

Details for the file archeologist-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: archeologist-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8f662cb6c3cfaff37ccc88108aff6420018f527657b9d671fa323679dcdbee57
MD5 090612b17f8787e654bfae48b4817125
BLAKE2b-256 d56dd96d4eb44f4140e2badff7997498c449183dfee0a588e8e25246a38d3df0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page