Skip to main content

Architecture extraction & codebase intelligence for the agentic era

Project description

archex

Architecture extraction and codebase intelligence for the agentic era.

archex is a Python library and CLI that transforms any Git repository into structured architectural intelligence and token-budget-aware code context. It serves two consumers from a single index: human architects receive an ArchProfile with module boundaries, dependency graphs, detected patterns, and interface surfaces; AI agents receive a ContextBundle with relevance-ranked, syntax-aligned code chunks assembled to fit within a specified token budget.

Features

  • 4 language adapters — Python, TypeScript/JavaScript, Go, Rust (tree-sitter AST parsing)
  • 3 public APIsanalyze(), query(), compare()
  • Hybrid retrieval — BM25 keyword search + optional vector embeddings with reciprocal rank fusion
  • Token budget assembly — AST-aware chunking, dependency-graph expansion, greedy bin-packing
  • Structural analysis — module detection (Louvain), pattern recognition, interface extraction
  • Cross-repo comparison — 6 architectural dimensions, no LLM required
  • LLM-optional — entire structural pipeline runs without API calls; LLM enrichment is opt-in

Installation

pip install archex

Extras

Extra What it adds
archex[vector] ONNX-based local embeddings (Nomic Code)
archex[vector-torch] Torch-backed sentence-transformers
archex[voyage] Voyage Code API embeddings
archex[openai] OpenAI API embeddings + LLM enrichment
archex[anthropic] Anthropic API LLM enrichment
archex[all] All optional dependencies

Quick Start

Python API

from archex import analyze, query, compare
from archex.models import RepoSource

# Architectural analysis
profile = analyze(RepoSource(local_path="./my-project"))
for module in profile.module_map:
    print(f"{module.name}: {len(module.files)} files")
for pattern in profile.pattern_catalog:
    print(f"[{pattern.confidence:.0%}] {pattern.name}")

# Implementation context for an agent
bundle = query(
    RepoSource(local_path="./my-project"),
    "How does authentication work?",
    token_budget=8192,
)
print(bundle.to_prompt(format="xml"))

# Cross-repo comparison
result = compare(
    RepoSource(local_path="./project-a"),
    RepoSource(local_path="./project-b"),
    dimensions=["error_handling", "api_surface"],
)

CLI

# Analyze a local repo or remote URL
archex analyze ./my-project --format json
archex analyze https://github.com/org/repo --format markdown -l python --timing

# Query for implementation context
archex query ./my-project "How does auth work?" --budget 8192 --format xml
archex query ./my-project "connection pooling" --strategy hybrid --timing

# Compare two repositories
archex compare ./project-a ./project-b --dimensions error_handling,api_surface --format markdown

# Manage the analysis cache
archex cache list
archex cache clean --max-age 168
archex cache info

CLI Reference

archex analyze <source>

Analyze a repository and produce an architecture profile.

Option Default Description
--format json Output format: json, markdown
-l / --language all Filter by language (repeatable)
--timing off Print timing breakdown to stderr

archex query <source> <question>

Query a repository and return a context bundle.

Option Default Description
--budget 8192 Token budget for context assembly
--format xml Output format: xml, json, markdown
-l / --language all Filter by language (repeatable)
--strategy bm25 Retrieval strategy: bm25, hybrid
--timing off Print timing breakdown to stderr

archex compare <source_a> <source_b>

Compare two repositories across architectural dimensions.

Option Default Description
--dimensions all 6 Comma-separated dimension list
--format json Output format: json, markdown
-l / --language all Filter by language (repeatable)

Supported dimensions: api_surface, concurrency, configuration, error_handling, state_management, testing.

archex cache <subcommand>

Manage the local analysis cache.

Subcommand Options Description
list --cache-dir List cached entries
clean --max-age N (hours), --cache-dir Remove expired entries
info --cache-dir Show cache summary

Development

git clone https://github.com/AetherForge/archex.git
cd archex
uv sync --all-extras

# Run tests (372 tests)
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check (strict mode)
uv run pyright

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archex-0.1.0.tar.gz (314.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archex-0.1.0-py3-none-any.whl (82.3 kB view details)

Uploaded Python 3

File details

Details for the file archex-0.1.0.tar.gz.

File metadata

  • Download URL: archex-0.1.0.tar.gz
  • Upload date:
  • Size: 314.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for archex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9fdb52ee14be8f6e6d7f2d317b4b5d0e25e9e96610fc807d44ee2281f5b7a95b
MD5 7d5dbb34abaca34687f16a45d75fbabf
BLAKE2b-256 1fa10335520b1fc988dd326f632cbceb76dd000c6dac7cf99ab6b9a3c13acf14

See more details on using hashes here.

File details

Details for the file archex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: archex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 82.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for archex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a70470c475fcd3d1f0ded7cca1805cecdd169482e73e6006267df7a6d169d788
MD5 eabebe44d0d3afd9797f11d12f98c01b
BLAKE2b-256 6681235f45cd3839288b5378cc265e190375239ac0c17e1dbf531824fd88db4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page