Skip to main content

Architecture extraction & codebase intelligence for the agentic era

Project description

archex

Architecture extraction and codebase intelligence for the agentic era.

archex is a Python library and CLI that transforms any Git repository into structured architectural intelligence and token-budget-aware code context. It serves two consumers from a single index: human architects receive an ArchProfile with module boundaries, dependency graphs, detected patterns, and interface surfaces; AI agents receive a ContextBundle with relevance-ranked, syntax-aligned code chunks assembled to fit within a specified token budget.

Features

  • 4 language adapters — Python, TypeScript/JavaScript, Go, Rust (tree-sitter AST parsing)
  • 3 public APIsanalyze(), query(), compare()
  • Hybrid retrieval — BM25 keyword search + optional vector embeddings with reciprocal rank fusion
  • Token budget assembly — AST-aware chunking, dependency-graph expansion, greedy bin-packing
  • Structural analysis — module detection (Louvain), pattern recognition, interface extraction
  • Cross-repo comparison — 6 architectural dimensions, no LLM required
  • LLM-optional — entire structural pipeline runs without API calls; LLM enrichment is opt-in

Installation

pip install archex

Extras

Extra What it adds
archex[vector] ONNX-based local embeddings (Nomic Code)
archex[vector-torch] Torch-backed sentence-transformers
archex[voyage] Voyage Code API embeddings
archex[openai] OpenAI API embeddings + LLM enrichment
archex[anthropic] Anthropic API LLM enrichment
archex[all] All optional dependencies

Quick Start

Python API

from archex import analyze, query, compare
from archex.models import RepoSource

# Architectural analysis
profile = analyze(RepoSource(local_path="./my-project"))
for module in profile.module_map:
    print(f"{module.name}: {len(module.files)} files")
for pattern in profile.pattern_catalog:
    print(f"[{pattern.confidence:.0%}] {pattern.name}")

# Implementation context for an agent
bundle = query(
    RepoSource(local_path="./my-project"),
    "How does authentication work?",
    token_budget=8192,
)
print(bundle.to_prompt(format="xml"))

# Cross-repo comparison
result = compare(
    RepoSource(local_path="./project-a"),
    RepoSource(local_path="./project-b"),
    dimensions=["error_handling", "api_surface"],
)

CLI

# Analyze a local repo or remote URL
archex analyze ./my-project --format json
archex analyze https://github.com/org/repo --format markdown -l python --timing

# Query for implementation context
archex query ./my-project "How does auth work?" --budget 8192 --format xml
archex query ./my-project "connection pooling" --strategy hybrid --timing

# Compare two repositories
archex compare ./project-a ./project-b --dimensions error_handling,api_surface --format markdown

# Manage the analysis cache
archex cache list
archex cache clean --max-age 168
archex cache info

CLI Reference

archex analyze <source>

Analyze a repository and produce an architecture profile.

Option Default Description
--format json Output format: json, markdown
-l / --language all Filter by language (repeatable)
--timing off Print timing breakdown to stderr

archex query <source> <question>

Query a repository and return a context bundle.

Option Default Description
--budget 8192 Token budget for context assembly
--format xml Output format: xml, json, markdown
-l / --language all Filter by language (repeatable)
--strategy bm25 Retrieval strategy: bm25, hybrid
--timing off Print timing breakdown to stderr

archex compare <source_a> <source_b>

Compare two repositories across architectural dimensions.

Option Default Description
--dimensions all 6 Comma-separated dimension list
--format json Output format: json, markdown
-l / --language all Filter by language (repeatable)

Supported dimensions: api_surface, concurrency, configuration, error_handling, state_management, testing.

archex cache <subcommand>

Manage the local analysis cache.

Subcommand Options Description
list --cache-dir List cached entries
clean --max-age N (hours), --cache-dir Remove expired entries
info --cache-dir Show cache summary

Development

git clone https://github.com/AetherForge/archex.git
cd archex
uv sync --all-extras

# Run tests (372 tests)
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check (strict mode)
uv run pyright

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archex-0.3.0.tar.gz (377.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archex-0.3.0-py3-none-any.whl (93.1 kB view details)

Uploaded Python 3

File details

Details for the file archex-0.3.0.tar.gz.

File metadata

  • Download URL: archex-0.3.0.tar.gz
  • Upload date:
  • Size: 377.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for archex-0.3.0.tar.gz
Algorithm Hash digest
SHA256 820a6a53894cfe6eed48f023fd1306f661aa90701918b360e8810e401cea0fef
MD5 76c1f76f439c228f670cb12ee4a29858
BLAKE2b-256 deb1f33ff9d60238afa8bbe2f54c0159ec345f0539b32dee5269bc2936a30aa8

See more details on using hashes here.

File details

Details for the file archex-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: archex-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 93.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for archex-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5aef230d4b0de0b5bde61464966f3ac0d2ac6d73a058b36bb9256e0ecda1428f
MD5 94f1bd2e9a97b0f627c3bd31c607a891
BLAKE2b-256 dc2868a0084c2bcbf1efc2d422060ed1d7ac5432d593c5b487a75721aaced690

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page