Architecture extraction & codebase intelligence for the agentic era
Project description
archex
Architecture extraction and codebase intelligence for the agentic era.
archex is a Python library and CLI that transforms any Git repository into structured architectural intelligence and token-budget-aware code context. It serves two consumers from a single index: human architects receive an ArchProfile with module boundaries, dependency graphs, detected patterns, and interface surfaces; AI agents receive a ContextBundle with relevance-ranked, syntax-aligned code chunks assembled to fit within a specified token budget.
Features
- 4 language adapters — Python, TypeScript/JavaScript, Go, Rust (tree-sitter AST parsing)
- 3 public APIs —
analyze(),query(),compare() - Hybrid retrieval — BM25 keyword search + optional vector embeddings with reciprocal rank fusion
- Token budget assembly — AST-aware chunking, dependency-graph expansion, greedy bin-packing
- Structural analysis — module detection (Louvain), pattern recognition, interface extraction
- Cross-repo comparison — 6 architectural dimensions, no LLM required
- LLM-optional — entire structural pipeline runs without API calls; LLM enrichment is opt-in
Installation
pip install archex
Extras
| Extra | What it adds |
|---|---|
archex[vector] |
ONNX-based local embeddings (Nomic Code) |
archex[vector-torch] |
Torch-backed sentence-transformers |
archex[voyage] |
Voyage Code API embeddings |
archex[openai] |
OpenAI API embeddings + LLM enrichment |
archex[anthropic] |
Anthropic API LLM enrichment |
archex[all] |
All optional dependencies |
Quick Start
Python API
from archex import analyze, query, compare
from archex.models import RepoSource
# Architectural analysis
profile = analyze(RepoSource(local_path="./my-project"))
for module in profile.module_map:
print(f"{module.name}: {len(module.files)} files")
for pattern in profile.pattern_catalog:
print(f"[{pattern.confidence:.0%}] {pattern.name}")
# Implementation context for an agent
bundle = query(
RepoSource(local_path="./my-project"),
"How does authentication work?",
token_budget=8192,
)
print(bundle.to_prompt(format="xml"))
# Cross-repo comparison
result = compare(
RepoSource(local_path="./project-a"),
RepoSource(local_path="./project-b"),
dimensions=["error_handling", "api_surface"],
)
CLI
# Analyze a local repo or remote URL
archex analyze ./my-project --format json
archex analyze https://github.com/org/repo --format markdown -l python --timing
# Query for implementation context
archex query ./my-project "How does auth work?" --budget 8192 --format xml
archex query ./my-project "connection pooling" --strategy hybrid --timing
# Compare two repositories
archex compare ./project-a ./project-b --dimensions error_handling,api_surface --format markdown
# Manage the analysis cache
archex cache list
archex cache clean --max-age 168
archex cache info
CLI Reference
archex analyze <source>
Analyze a repository and produce an architecture profile.
| Option | Default | Description |
|---|---|---|
--format |
json |
Output format: json, markdown |
-l / --language |
all | Filter by language (repeatable) |
--timing |
off | Print timing breakdown to stderr |
archex query <source> <question>
Query a repository and return a context bundle.
| Option | Default | Description |
|---|---|---|
--budget |
8192 |
Token budget for context assembly |
--format |
xml |
Output format: xml, json, markdown |
-l / --language |
all | Filter by language (repeatable) |
--strategy |
bm25 |
Retrieval strategy: bm25, hybrid |
--timing |
off | Print timing breakdown to stderr |
archex compare <source_a> <source_b>
Compare two repositories across architectural dimensions.
| Option | Default | Description |
|---|---|---|
--dimensions |
all 6 | Comma-separated dimension list |
--format |
json |
Output format: json, markdown |
-l / --language |
all | Filter by language (repeatable) |
Supported dimensions: api_surface, concurrency, configuration, error_handling, state_management, testing.
archex cache <subcommand>
Manage the local analysis cache.
| Subcommand | Options | Description |
|---|---|---|
list |
--cache-dir |
List cached entries |
clean |
--max-age N (hours), --cache-dir |
Remove expired entries |
info |
--cache-dir |
Show cache summary |
Development
git clone https://github.com/AetherForge/archex.git
cd archex
uv sync --all-extras
# Run tests (372 tests)
uv run pytest
# Lint and format
uv run ruff check .
uv run ruff format .
# Type check (strict mode)
uv run pyright
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file archex-0.3.0.tar.gz.
File metadata
- Download URL: archex-0.3.0.tar.gz
- Upload date:
- Size: 377.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
820a6a53894cfe6eed48f023fd1306f661aa90701918b360e8810e401cea0fef
|
|
| MD5 |
76c1f76f439c228f670cb12ee4a29858
|
|
| BLAKE2b-256 |
deb1f33ff9d60238afa8bbe2f54c0159ec345f0539b32dee5269bc2936a30aa8
|
File details
Details for the file archex-0.3.0-py3-none-any.whl.
File metadata
- Download URL: archex-0.3.0-py3-none-any.whl
- Upload date:
- Size: 93.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5aef230d4b0de0b5bde61464966f3ac0d2ac6d73a058b36bb9256e0ecda1428f
|
|
| MD5 |
94f1bd2e9a97b0f627c3bd31c607a891
|
|
| BLAKE2b-256 |
dc2868a0084c2bcbf1efc2d422060ed1d7ac5432d593c5b487a75721aaced690
|