Skip to main content

Architecture observability via graph analysis for Git repositories.

Project description

archobs

archobs analyzes a Git repository and tells you where your architecture is healthy and where it's not. It builds a graph of file relationships from three signals -- git co-change history, import/dependency edges, and semantic similarity -- then clusters files into subsystems and scores each one for boundary health, risk hotspots, and drift over time.

The output is an HTML report you open in a browser.

What it actually measures

  • Clusters: Groups of files that belong together based on how they change together, import each other, and are semantically related. Think of these as the subsystems your codebase actually has (vs. what your folder structure implies).
  • Boundary health: How well each cluster stays self-contained. Metrics include cohesion (internal connectivity), leakage (cross-boundary edges), and conductance.
  • Risk hotspots: Files ranked by a combination of boundary leakage, hubness (how many clusters a file bridges), and volatility (how often it changes).
  • Drift: How clusters are shifting over time -- are subsystem boundaries getting cleaner or messier?
  • Suggestions: Actionable recommendations to improve architecture (rule-based by default, optionally LLM-powered via Codex or Claude).

Supported languages

Python, TypeScript, JavaScript (including .tsx, .jsx, .mjs, .cjs), and Java.

Requirements

  • Python 3.11+
  • A Git repository with some history

Install

This installs the core dependencies (NetworkX, NumPy, Pandas, PyArrow, Typer).

For better clustering and interactive graph visualization, install the full extras:

pip install -e '.[full]'

This adds Leiden community detection (python-igraph + leidenalg), interactive HTML graph rendering (pyvis), and Tree-sitter parsing.

Usage

1. Initialize a workspace

archobs init --repo /path/to/your/repo --out .archobs

This creates a .archobs/ directory with a config.json you can tweak. If you omit --repo, it defaults to the current directory.

2. Run the full analysis

archobs report --repo /path/to/your/repo --out .archobs

This runs the complete pipeline:

  1. Inventories tracked source files
  2. Extracts git co-change history
  3. Parses imports and dependencies
  4. Generates embeddings (for semantic similarity)
  5. Builds and fuses a file-relationship graph
  6. Clusters files into subsystems
  7. Computes boundary health metrics and risk scores
  8. Writes an HTML report to .archobs/report/index.html

Open the report in your browser:

open .archobs/report/index.html

Run individual stages

You can run each stage separately if you want to inspect intermediate artifacts:

archobs extract inventory --repo .    # file list
archobs extract git --repo .          # co-change history
archobs extract deps --repo .         # import/dependency edges
archobs embed --repo .                # semantic embeddings
archobs build-graph --repo .          # fused graph
archobs cluster --repo .              # subsystem clustering

Each stage writes Parquet files to .archobs/artifacts/ that downstream stages consume.

Key options

Flag Default What it does
--provider auto Embedding provider. auto uses Codanna if available, otherwise deterministic local hashing.
--algo auto Clustering algorithm. auto uses Leiden if installed, otherwise NetworkX greedy modularity.
--resolution 1.0 Clustering resolution. Higher values produce more, smaller clusters.
--k-sem 20 Number of semantic nearest neighbors per file.
--tau-sem 0.35 Minimum similarity threshold for semantic edges.
--suggestions-provider auto How to generate suggestions: auto (tries Claude, then Codex, then local rules), claude, codex, rules, or off.

Output files

After running archobs report, the .archobs/ directory contains:

.archobs/
  config.json                 # your configuration
  artifacts/
    files.parquet             # file inventory
    commits.parquet           # git co-change data
    imports.parquet           # resolved import edges
    graph_edges.parquet       # fused relationship graph
    clusters.parquet          # cluster assignments
    file_metrics.parquet      # per-file risk scores
    cluster_metrics.parquet   # per-cluster health scores
    ...
  report/
    index.html                # open this in a browser
    graph.html                # interactive graph visualization
    graph.graphml             # for Gephi / yEd
    graph.gexf                # for Gephi
    summary.json              # machine-readable summary

Configuration

The config.json in your .archobs/ directory controls all pipeline behavior. You can edit it directly or pass flags to override individual settings. Key sections:

  • filters: Which files to include/exclude (by extension, path prefix, file size)
  • extraction: Parser backend and language settings
  • embedding: Provider, model, dimensions
  • graph: Edge weights, thresholds, decay parameters
  • clustering: Algorithm, resolution, drift window
  • reporting: Risk ranking limits, suggestion provider and count

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archobs-0.1.0.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archobs-0.1.0-py3-none-any.whl (60.0 kB view details)

Uploaded Python 3

File details

Details for the file archobs-0.1.0.tar.gz.

File metadata

  • Download URL: archobs-0.1.0.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for archobs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 954653df7bd3b23d4fee700820b04172df7aaf2891cef88139b6bc9c38a025c7
MD5 db97079a87883d5f31e6768636db17df
BLAKE2b-256 1ad9a70bce07d5f9fab1428c2add34981181d76a1e17cc1a8b5624cf9b3a1257

See more details on using hashes here.

File details

Details for the file archobs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: archobs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 60.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for archobs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 539a903f0ed48e7c223fb336fa20244b37c7a1b3a5eaf7155c4e857ae82a587f
MD5 0cc34ac64555c0dfd9e07993f2eb69fa
BLAKE2b-256 aa8c77b3180b530084bd0261b48c730f68328c11771b660b5009804e0082ae97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page