Skip to main content

Security-first code intelligence for AI agents — taint analysis, CVE detection, MCP integration

Project description

ArchGraph

License: MIT Python 3.11+ MCP Server Tests

Security-first code intelligence for AI agents.
Parses 10 languages, builds a knowledge graph with taint analysis, CVE detection, and clustering.
Connect to any AI agent via MCP — Cursor, Claude Code, Windsurf, and more.


Why ArchGraph?

Other tools help you understand code. ArchGraph helps you secure it.

ArchGraph Code Search AST Parsers SAST Tools
Taint Analysis ✅ Input → Sink
CVE Detection ✅ Auto via OSV Partial
CFG / Data Flow ✅ libclang + tree-sitter Partial
MCP for AI Agents ✅ 7 tools
Functional Clustering ✅ Community detection
Execution Tracing ✅ Entry → Sink flows
Export (JSON/GraphML) Partial
Local-first ✅ Neo4j Varies Varies
License MIT Varies Varies Often proprietary

Quick Start

# Install
pip install archgraph

# Extract (auto-detects languages)
archgraph extract /path/to/repo -w 4

# Query the graph
archgraph query "MATCH (f:Function {is_input_source: true}) RETURN f.name, f.file"

# Start web dashboard
archgraph serve --port 8080

# Generate HTML security report
archgraph report /path/to/repo

With Docker (Neo4j included):

docker compose up -d neo4j           # password: archgraph
archgraph extract /path/to/repo --neo4j-password archgraph

🤖 AI Agent Integration (MCP)

ArchGraph exposes 7 tools and 4 resources to any MCP-compatible agent.

Setup

# Index your repo
archgraph extract . --include-cve --include-clustering

# Start MCP server
archgraph mcp

Connect your agent:

Agent Command
Claude Code claude mcp add archgraph -- archgraph mcp
Cursor Add to ~/.cursor/mcp.json
Windsurf Add to MCP config
OpenCode Add to ~/.config/opencode/config.json

What Your Agent Gets

Tools: query, impact, context, detect_changes, find_vulnerabilities, cypher, stats

Resources: archgraph://schema, archgraph://security, archgraph://clusters, archgraph://processes

Example Conversation

You: "Are there any buffer overflow risks in the network code?"

Agent:
1. Queries input sources in network files
2. Traces taint paths to dangerous sinks
3. Reports: "Found 2 paths:
   - net_recv() → memcpy() in src/net/handler.c (depth: 3)
   - read_packet() → strcpy() in src/net/parser.c (depth: 4)
   Both reach dangerous sinks without validation."

🔒 Security Features

Automatic labeling — Every function gets security labels:

  • is_input_source — reads external data (recv, read, fetch, ...)
  • is_dangerous_sink — dangerous operations (memcpy, exec, eval, ...)
  • is_allocator, is_crypto, is_parser — additional categories
  • risk_score — 0-100 risk score based on labels

Taint path detection:

MATCH path = (src:Function {is_input_source: true})-[:CALLS*1..8]->(sink:Function {is_dangerous_sink: true})
RETURN src.name, sink.name, length(path) AS depth

CVE enrichment:

archgraph extract . --include-cve    # Queries OSV API automatically

All Commands

Command Description
extract Extract code graph from repository
query Run Cypher queries against the graph
stats Show node/edge statistics
schema Show graph schema
diff Compare repo state vs stored graph
impact Blast radius analysis for a function
export Export to JSON, GraphML, or CSV
report Generate HTML security report
serve Start web dashboard
mcp Start MCP server for AI agents
skills Generate AI agent skill files
repos List indexed repositories

Use Cases

Security Audit

archgraph extract /target -l c,cpp --include-cve --include-clang
archgraph query "MATCH path = (src:Function {is_input_source: true})-[:CALLS*1..5]->(sink:Function {is_dangerous_sink: true}) RETURN src.name, sink.name"

Code Review

archgraph diff /path/to/repo
archgraph impact "func:src/api.c:handle:42" --direction both

Reverse Engineering

archgraph extract /binary/project -l c,cpp,rust --include-clang --include-deep
archgraph query "MATCH (f:Function) WHERE f.is_exported = true RETURN f.name, f.file"

Architecture

                  ┌──────────────────────────────────────────────────┐
                  │         GraphBuilder Pipeline (11 steps)         │
                  │                                                  │
  Local Path ─────┤  1. Tree-sitter structural extraction            │
     or           │  2. Git history                                  │
  GitHub URL ─────┤  3. Dependency extraction                        │──── Neo4j
  (auto clone)    │  4. Annotation scanning                          │     Store
                  │  5. Security labeling                            │       │
                  │  6. Clang deep analysis (C/C++)                  │       ├── MCP Server
                  │  7. Tree-sitter deep analysis (Rust/Java/Go/…)   │       ├── Web Dashboard
                  │  8. Churn enrichment                             │       └── Export/Report
                  │  9. CVE enrichment (OSV API)                     │
                  │ 10. Clustering (community detection)             │
                  │ 11. Process tracing (execution flows)            │
                  └──────────────────────────────────────────────────┘

Benchmarks

Project Language Files Nodes Edges Time
zlib (~50K LOC) C 79 2,389 3,968 6.6s
fastify (~30K LOC) JavaScript 487 2,810 18,472 10.5s
Linux drivers/usb (~500K LOC) C 892 62,812 122,746 12.7s

Benchmarks: Windows 11, Python 3.13, single-threaded. Parallel mode (-w 4) is 2-3x faster.


Documentation

Document Description
Architecture & Schema Graph schema, node/edge types, pipeline
CLI Reference All commands and options
AI Agent Integration MCP setup, tools, examples
Security Analysis Security labeling, Cypher queries
Deep Analysis CFG, data flow, taint tracking
Roadmap Development phases

Testing

pytest tests/ -v  # 137 passed, 22 skipped

No external services required. Tests use temporary directories with real tree-sitter parsing and git operations.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archgraph-0.2.0.tar.gz (119.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archgraph-0.2.0-py3-none-any.whl (103.3 kB view details)

Uploaded Python 3

File details

Details for the file archgraph-0.2.0.tar.gz.

File metadata

  • Download URL: archgraph-0.2.0.tar.gz
  • Upload date:
  • Size: 119.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for archgraph-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3cd96796fecd037e0077841ba04b09817c3550827eb9b249fae16fb2f3ca04dd
MD5 e520fa55ebfd56506a7a32a89d0b4345
BLAKE2b-256 fb07c4117d459c781196025db733f2a6e3bcac2615c640372dea78a325cb3e34

See more details on using hashes here.

File details

Details for the file archgraph-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: archgraph-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 103.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for archgraph-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 003f47afa80f82c0624fa82cfea6831c93ad2d08a6ccced6cf8f643cf47446d5
MD5 286ca0d77775fb3088169c88cbc8da24
BLAKE2b-256 847ec3e66f6b25c13fa0311b6738e164592a60a150897be6f9685a837aa9da31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page