Architecture observability via graph analysis for Git repositories.
Project description
archobs
archobs analyzes a Git repository and tells you where your architecture is healthy and where it's not. It builds a graph of file relationships from three signals -- git co-change history, import/dependency edges, and semantic similarity -- then clusters files into subsystems and scores each one for boundary health, risk hotspots, and drift over time.
The output is an HTML report you open in a browser.
What it actually measures
- Clusters: Groups of files that belong together based on how they change together, import each other, and are semantically related. Think of these as the subsystems your codebase actually has (vs. what your folder structure implies).
- Boundary health: How well each cluster stays self-contained. Metrics include cohesion (internal connectivity), leakage (cross-boundary edges), and conductance.
- Risk hotspots: Files ranked by a combination of boundary leakage, hubness (how many clusters a file bridges), and volatility (how often it changes).
- Drift: How clusters are shifting over time -- are subsystem boundaries getting cleaner or messier?
- Suggestions: Actionable recommendations to improve architecture (rule-based by default, optionally LLM-powered via Codex or Claude).
Supported languages
Python, TypeScript, JavaScript (including .tsx, .jsx, .mjs, .cjs), and Java.
Requirements
- Python 3.11+
- A Git repository with some history
Install
This installs the core dependencies (NetworkX, NumPy, Pandas, PyArrow, Typer).
For better clustering and interactive graph visualization, install the full extras:
pip install -e '.[full]'
This adds Leiden community detection (python-igraph + leidenalg), interactive HTML graph rendering (pyvis), and Tree-sitter parsing.
Usage
1. Initialize a workspace
archobs init --repo /path/to/your/repo --out .archobs
This creates a .archobs/ directory with a config.json you can tweak. If you omit --repo, it defaults to the current directory.
2. Run the full analysis
archobs report --repo /path/to/your/repo --out .archobs
This runs the complete pipeline:
- Inventories tracked source files
- Extracts git co-change history
- Parses imports and dependencies
- Generates embeddings (for semantic similarity)
- Builds and fuses a file-relationship graph
- Clusters files into subsystems
- Computes boundary health metrics and risk scores
- Writes an HTML report to
.archobs/report/index.html
Open the report in your browser:
open .archobs/report/index.html
Run individual stages
You can run each stage separately if you want to inspect intermediate artifacts:
archobs extract inventory --repo . # file list
archobs extract git --repo . # co-change history
archobs extract deps --repo . # import/dependency edges
archobs embed --repo . # semantic embeddings
archobs build-graph --repo . # fused graph
archobs cluster --repo . # subsystem clustering
Each stage writes Parquet files to .archobs/artifacts/ that downstream stages consume.
Key options
| Flag | Default | What it does |
|---|---|---|
--provider |
auto |
Embedding provider. auto uses Codanna if available, otherwise deterministic local hashing. |
--algo |
auto |
Clustering algorithm. auto uses Leiden if installed, otherwise NetworkX greedy modularity. |
--resolution |
1.0 |
Clustering resolution. Higher values produce more, smaller clusters. |
--k-sem |
20 |
Number of semantic nearest neighbors per file. |
--tau-sem |
0.35 |
Minimum similarity threshold for semantic edges. |
--suggestions-provider |
auto |
How to generate suggestions: auto (tries Claude, then Codex, then local rules), claude, codex, rules, or off. |
Output files
After running archobs report, the .archobs/ directory contains:
.archobs/
config.json # your configuration
artifacts/
files.parquet # file inventory
commits.parquet # git co-change data
imports.parquet # resolved import edges
graph_edges.parquet # fused relationship graph
clusters.parquet # cluster assignments
file_metrics.parquet # per-file risk scores
cluster_metrics.parquet # per-cluster health scores
...
report/
index.html # open this in a browser
graph.html # interactive graph visualization
graph.graphml # for Gephi / yEd
graph.gexf # for Gephi
summary.json # machine-readable summary
Configuration
The config.json in your .archobs/ directory controls all pipeline behavior. You can edit it directly or pass flags to override individual settings. Key sections:
- filters: Which files to include/exclude (by extension, path prefix, file size)
- extraction: Parser backend and language settings
- embedding: Provider, model, dimensions
- graph: Edge weights, thresholds, decay parameters
- clustering: Algorithm, resolution, drift window
- reporting: Risk ranking limits, suggestion provider and count
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file archobs-0.1.0.tar.gz.
File metadata
- Download URL: archobs-0.1.0.tar.gz
- Upload date:
- Size: 63.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
954653df7bd3b23d4fee700820b04172df7aaf2891cef88139b6bc9c38a025c7
|
|
| MD5 |
db97079a87883d5f31e6768636db17df
|
|
| BLAKE2b-256 |
1ad9a70bce07d5f9fab1428c2add34981181d76a1e17cc1a8b5624cf9b3a1257
|
File details
Details for the file archobs-0.1.0-py3-none-any.whl.
File metadata
- Download URL: archobs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
539a903f0ed48e7c223fb336fa20244b37c7a1b3a5eaf7155c4e857ae82a587f
|
|
| MD5 |
0cc34ac64555c0dfd9e07993f2eb69fa
|
|
| BLAKE2b-256 |
aa8c77b3180b530084bd0261b48c730f68328c11771b660b5009804e0082ae97
|