Skip to main content

Code archaeologist - reconstruct function decision history via AST-aware lineage tracking

Project description

Archeologist - Semantic Lineage Graph Generator

Post-incident archaeology tool that reconstructs function decision history via deterministic AST-aware lineage tracking.

CLI Usage

# Install
pip install -e .

# Analyze file (auto-detects git repo)
arc analyze path/to/file.py

# Analyze specific function  
arc analyze-function path/to/file.py function_name

# With GitHub PR integration
arc analyze-function path/to/file.py function_name --repo owner/repo

# With LLM narrative synthesis
export CLAUDE_API_KEY=sk-ant-xxx
arc analyze-function path/to/file.py function_name

Configuration

Set environment variables:

export GITHUB_TOKEN=ghp_xxx
export CLAUDE_API_KEY=sk-ant-xxx
export GIT_REPO_PATH=/path/to/local/repo

Architecture

Three-phase pipeline:

  1. Semantic Lineage Tracking

    • GitWalker traverses history (--no-renames flag)
    • ASTParser extracts function boundaries (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
    • LineageTracker links nodes via four-tier hierarchy
  2. Contextual Slicing

    • PRFetcher pulls associated PRs
    • Geographic filter maps review comments to AST node line ranges
  3. Narrative Synthesis

    • LiteLLM abstracts LLM calls (Claude, local models)
    • Outputs 5-sentence brief explaining decisions

MCP Server

The tool exposes an MCP-compatible JSON-RPC 2.0 server over stdio:

# Run MCP server
arc-mcp

# Or run directly
python -m src.mcp.server

Available Methods

// List functions in a file
{"jsonrpc": "2.0", "id": 1, "method": "list_functions", "params": {"file_path": "/path/to/file.py"}}

// Analyze a specific function
{"jsonrpc": "2.0", "id": 2, "method": "analyze_function", "params": {"file_path": "/path/to/file.py", "function_name": "foo"}}

// Analyze a file's overall lineage
{"jsonrpc": "2.0", "id": 3, "method": "analyze_file", "params": {"file_path": "/path/to/file.py"}}

Example

# Analyze the `authenticate` function in a FastAPI project
GIT_REPO_PATH=/Users/fuads/fastapi arc analyze-function app/auth.py authenticate --repo fastapi/fastapi

Output:

Analyzing function authenticate in app/auth.py...
Found 12 lineage edges for authenticate
Summary: Found 12 historical versions of this code. Change types: physical: 8, identity: 4

With LLM synthesis:

The authenticate function evolved through 12 commits over 18 months. 
Initial implementation used simple token validation, replaced in PR #2341 
with OAuth2 Bearer token parsing after security audit. Several performance 
optimizations were attempted (PRs #1892, #2103) but reverted due to race 
conditions. The current implementation handles both JWT and opaque tokens 
with a unified interface, consolidating three previous approaches.

Testing

pytest tests/

Roadmap

  • Graph construction (GitWalker, ASTParser, LineageTracker)
  • Contextual slicing (PRFetcher, GeographicFilter)
  • CLI commands with local git repo auto-detection
  • 10 language support (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP)
  • Real-world testing on Flask repo
  • MCP server (JSON-RPC 2.0 over stdio, works with Python 3.9)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archeologist-0.1.2.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archeologist-0.1.2-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file archeologist-0.1.2.tar.gz.

File metadata

  • Download URL: archeologist-0.1.2.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.2.tar.gz
Algorithm Hash digest
SHA256 50b9cfcaf32586034a0f68af5cdeffc70a60810261accf43f7a0ef52130cd52f
MD5 cb666706ec6b1f28648acd6ad333804d
BLAKE2b-256 b0cba8ee93ae8dd32b428e69e7d93cd2e6a200fdfb995b54d163199a961268b4

See more details on using hashes here.

File details

Details for the file archeologist-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: archeologist-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for archeologist-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 312da87f05ee2f04d88c1f2ce650d3a7a275d837a94c5a4c1b5b7e8c1c6a3d51
MD5 65cd9b0d923708e24f200e07cf898c4c
BLAKE2b-256 7ec476f9bf22509f55a3e285b8e1932fc56decda4d0beb522800eeb1db1b3f98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page