Skip to main content

Multi-level codebase structural analysis using information theory and graph algorithms

Project description

Shannon Insight

PyPI Python 3.9+ License: MIT

Multi-signal codebase analysis using information theory, graph algorithms, and git history. Cross-references dependency graphs, temporal co-change patterns, per-file complexity signals, and spectral analysis to surface structural problems that no single metric can catch.

Shannon Insight is for teams that want evidence-backed findings about where their codebase is fragile, tangled, or siloed -- not arbitrary quality scores.

Quick Start

pip install shannon-codebase-insight
cd your-project
shannon-insight .
✓ Analyzed 234 files in 2.1s

Moderate structural issues — 1 file needs attention

START HERE
  src/core/engine.py
  Why: Central file (blast=47) with high churn (cv=2.1) and single owner
  Data: blast=47, changes=89, cv=2.1, lines=412
  Issues: high risk hub, knowledge silo

ALSO CONSIDER
  #2  src/api/handlers.py          High coupling, 3 issues
  #3  src/models/user.py           God file, low coherence
  #4  src/utils/helpers.py         Orphan code

Patterns: 5 structural, 3 coupling, 2 churn, 1 team

Launch the interactive dashboard:

pip install shannon-codebase-insight[serve]
shannon-insight serve

What It Finds

Structural

Finding What It Detects Severity Example
god_file Files with too many responsibilities -- high complexity, low coherence HIGH core.py has 45 functions across 6 unrelated concerns
high_risk_hub Central files that are also complex or churning -- a bug here ripples widely CRITICAL engine.py imported by 47 files, changed 89 times
orphan_code Files with zero importers that may be dead code MEDIUM old_handler.py imported by nothing
hollow_code Files with >60% stub/empty functions -- started but never finished HIGH api_v2.py has 8 of 12 functions as pass
phantom_imports Imports that resolve to no file in the codebase MEDIUM from .missing_module import X
dead_dependency Import relationships where files never co-change in git history LOW A imports B but they haven't changed together in 688 commits

Architecture

Finding What It Detects Severity Example
hidden_coupling Files that co-change together but share no import HIGH cache.py and db.py change together 82% of the time with no import
boundary_mismatch Directories whose files are more connected to other directories MEDIUM Files in src/api/ are more tightly coupled to src/models/
layer_violation Dependencies that flow backward through architectural layers MEDIUM models/ imports from controllers/
zone_of_pain Modules that are both concrete and stable -- painful to change MEDIUM core/ has 0.1 abstractness and 0.2 instability
flat_architecture Codebase lacks composition layer between leaf modules MEDIUM All modules at depth 1 with high glue deficit

Stability

Finding What It Detects Severity Example
unstable_file Files with increasing churn that aren't stabilizing HIGH handlers.py trajectory: CHURNING, cv=2.3
chronic_problem Findings that persist across 3+ analysis snapshots HIGH god_file on engine.py persisting 5 snapshots
thrashing_code Files with erratic, spiking change patterns HIGH config.py has SPIKING trajectory with cv=3.1
bug_magnet Files where >40% of commits mention "fix" HIGH parser.py fix_ratio=0.62, 45 changes

Team

Finding What It Detects Severity Example
knowledge_silo Central files owned by a single contributor HIGH auth.py bus_factor=1.0, PageRank top 5%
review_blindspot High-centrality files with single owner and no tests HIGH billing.py imported by 30 files, 1 author, no test file
truck_factor Files where only one person has ever committed HIGH scheduler.py sole author, blast_radius=12
conway_violation Structurally-coupled modules maintained by different teams MEDIUM api/ and models/ tightly coupled but 0% author overlap

Code Quality

Finding What It Detects Severity Example
copy_paste_clone File pairs with high content similarity (NCD < 0.3) MEDIUM handler_v1.py and handler_v2.py are 85% similar
incomplete_implementation Files with multiple incomplete signals (stubs + phantom imports) HIGH service.py has 4 stubs and 2 missing imports
naming_drift Files whose names don't match their actual content LOW utils.py contains only database connection logic
directory_hotspot Directories where most files are high-risk or churning HIGH src/api/ has 5 of 7 files in top risk quartile

Also: weak_link (file worse than its graph neighborhood), bug_attractor (central file with high fix ratio), accidental_coupling (imports between unrelated files), architecture_erosion (violation rate increasing over time), duplicate_incomplete (cloned files that are both incomplete).

How It Works

Shannon Insight scans source files for structural metrics (LOC, function count, nesting depth, imports), builds a dependency graph, and runs PageRank, strongly connected components, and Louvain community detection. If git history is available, it extracts co-change patterns, churn trajectories, author entropy, and fix ratios.

These raw signals are fused through percentile normalization and weighted combination into per-file risk scores. A health Laplacian identifies files that are worse than their graph neighbors. 28 finders read from the unified signal field and produce evidence-backed findings ranked by severity.

The system works with or without git. Without git, temporal findings (hidden coupling, unstable files, team finders) are skipped; structural and per-file findings still work. See docs/SIGNALS.md for the full signal reference.

Supported Languages

Language Extensions Import Detection Full Support
Python .py import, from...import Yes
Go .go import "..." Yes
TypeScript .ts, .tsx import, require Yes
JavaScript .js, .jsx import, require Yes
Java .java import Yes
Rust .rs use, mod Yes
Ruby .rb require, require_relative Yes
C/C++ .c, .cpp, .cc, .h, .hpp #include Yes

Language is auto-detected. Use --language <name> to force a specific scanner.

CLI Reference

shannon-insight [PATH] -- Analyze

Analyze codebase quality. Default command when no subcommand is given.

shannon-insight .
shannon-insight --changed
shannon-insight --json --fail-on high
shannon-insight --verbose --concerns
shannon-insight --hotspots
shannon-insight --signals src/engine.py
shannon-insight --preview
Flag Default Description
PATH . Project root to analyze
--changed off Scope to files changed on current branch (auto-detects base)
--since REF none Scope to files changed since a git ref (e.g. HEAD~3)
--json off Machine-readable JSON output
--verbose, -v off Show detailed evidence and patterns
--save/--no-save --save Save snapshot to .shannon/ history
--fail-on LEVEL none Exit 1 if findings at level: any or high
--hotspots off Show files ranked by combined risk signals
--signals [FILE] none Show raw signals table (optionally for a specific file)
--concerns off Show findings grouped by concern category
--journey off Developer journey view: health score, progress, next steps
--preview off Show what would be analyzed without running
--output-format auto Output format: default, github, compact
--no-tui off Disable interactive TUI, use classic output
--version off Show version and exit
-c, --config none TOML configuration file
-w, --workers auto Parallel worker count (1-32)

shannon-insight explain <FILE> -- File Deep-Dive

Deep-dive on a specific file: signals, findings, and trends.

shannon-insight explain engine.py
shannon-insight explain src/core/engine.py --verbose
shannon-insight explain engine.py --json
Flag Default Description
FILE required File to explain (substring match)
--json off JSON output
--verbose, -v off Show all signals (default shows top 8)

shannon-insight diff -- Compare Snapshots

Show what changed since a previous analysis run.

shannon-insight diff
shannon-insight diff --baseline
shannon-insight diff --ref 5
shannon-insight diff --pin
shannon-insight diff --unpin
Flag Default Description
--ref, -r none Compare against a specific snapshot ID or commit SHA
--baseline, -b off Compare against pinned baseline
--pin off Pin current snapshot as baseline
--unpin off Clear pinned baseline
--json off JSON output
--verbose, -v off Show full per-file metric details

shannon-insight health -- Health Trends

Show codebase health trends over time. Requires saved snapshots in .shannon/.

shannon-insight health
shannon-insight health --last 10
shannon-insight health --json
Flag Default Description
--last, -n 20 Number of recent snapshots to include (2-200)
--json off JSON output

shannon-insight history -- List Snapshots

List past analysis runs stored in .shannon/history.db.

shannon-insight history
shannon-insight history --limit 5
shannon-insight history --json
Flag Default Description
--limit, -n 20 Maximum snapshots to list (1-1000)
--json off JSON output

shannon-insight report -- HTML Report

Generate an interactive HTML report with treemap visualization.

shannon-insight report
shannon-insight report -o my-report.html -m entropy
shannon-insight report --no-trends
Flag Default Description
--output, -o shannon-report.html Output file path
--metric, -m cognitive_load Default metric for treemap coloring
--trends/--no-trends --trends Include file trend sparklines
--verbose, -v off Verbose logging

shannon-insight serve -- Live Dashboard

Start a live dashboard with file watching and WebSocket updates.

pip install shannon-codebase-insight[serve]
shannon-insight serve
shannon-insight serve --port 9000 --no-browser
Flag Default Description
--port 8765 Port to listen on
--host 127.0.0.1 Host to bind to
--no-browser off Don't open browser automatically
--verbose, -v off Verbose logging

Dashboard

Dashboard

The live dashboard (shannon-insight serve) provides 5 screens with real-time updates via WebSocket:

  • Overview -- Health score (1-10), verdict, issue summary by category, risk histogram, focus point
  • Issues -- Category tabs (Incomplete, Fragile, Tangled, Team), severity filters, finding cards with evidence
  • Files -- Searchable table with sortable columns, treemap view, file detail with all signals
  • Modules -- Module table with Martin metrics (instability, abstractness), module detail
  • Health -- Health trend chart, top movers, chronic findings, concern radar chart, global signals

Keyboard shortcuts: 1-5 switch tabs, / search files, j/k navigate, Enter drill down, Esc go back, ? show help.

Export: JSON (full state) or CSV (file table). API: GET /api/state, GET /api/gate, GET /api/export/json, GET /api/export/csv, WS /ws.

See docs/DASHBOARD.md for the full dashboard guide.

Configuration

Create shannon-insight.toml in your project root:

# ── File Filtering ──
exclude_patterns = ["*_test.go", "vendor/*", "node_modules/*", "dist/*"]
max_file_size_mb = 10.0            # Skip files larger than this (default: 10)
max_files = 10000                  # Max files to analyze (default: 10000)

# ── Git / Temporal ──
git_max_commits = 5000             # Max commits to analyze (default: 5000, 0 = no limit)
git_min_commits = 10               # Min commits for temporal analysis (default: 10)

# ── Insights ──
insights_max_findings = 50         # Max findings to return (default: 50)

# ── History ──
enable_history = true              # Auto-save snapshots to .shannon/ (default: true)
history_max_snapshots = 100        # Max snapshots to retain (default: 100)

# ── Performance ──
parallel_workers = 4               # Parallel workers (default: auto-detect)
enable_cache = true                # Enable disk cache (default: true)
cache_ttl_hours = 24               # Cache lifetime (default: 24)
timeout_seconds = 10               # File operation timeout (default: 10)

# ── PageRank ──
pagerank_damping = 0.85            # Damping factor (default: 0.85)
pagerank_iterations = 20           # Max iterations (default: 20)

# ── Security ──
allow_hidden_files = false         # Analyze dotfiles (default: false)
follow_symlinks = false            # Follow symlinks (default: false)

Precedence: CLI flags > SHANNON_* environment variables > shannon-insight.toml > defaults.

Environment variables use the SHANNON_ prefix: SHANNON_GIT_MAX_COMMITS=10000, SHANNON_INSIGHTS_MAX_FINDINGS=100, etc.

See docs/CONFIGURATION.md for the full configuration reference.

CI Integration

GitHub Actions

name: Code Quality
on: [pull_request]
jobs:
  shannon:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for temporal analysis
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install shannon-codebase-insight
      - run: shannon-insight --changed --fail-on high

The --fail-on high flag exits with code 1 if any finding has severity >= 0.8. Use --fail-on any to fail on any finding.

On GitHub Actions, output format is auto-detected to produce ::warning and ::error annotations on PR diffs. Force it with --output-format github.

Quality Gate API

When running the dashboard (shannon-insight serve), the /api/gate endpoint returns pass/fail status:

curl http://localhost:8765/api/gate
{
  "status": "PASS",
  "health": 7.2,
  "critical_count": 0,
  "finding_count": 12,
  "reason": "Health 7.2, no critical issues"
}

Fails when health < 4.0 or any finding has severity >= 0.9.

Exit Codes

Code Meaning
0 Clean -- no findings above threshold
1 Findings above threshold detected
130 Interrupted (Ctrl+C)

Signals Reference

Shannon Insight computes 62 signals across 6 categories:

Category Signals Examples
Size & Complexity 7 lines, function_count, cognitive_load, max_nesting
Graph Position 13 pagerank, blast_radius_size, in_degree, community
Code Health 6 compression_ratio, semantic_coherence, stub_ratio
Change History 8 total_changes, churn_cv, bus_factor, fix_ratio
Team Context 2 author_entropy, bus_factor
Computed Risk 4 risk_score, wiring_quality, file_health_score, raw_risk

Plus 15 per-module signals (Martin metrics, velocity, knowledge Gini) and 13 global signals (modularity, Fiedler value, codebase health).

See docs/SIGNALS.md for the full signal reference.

How Scoring Works

Per-file: Raw signals are percentile-normalized across all files, then combined into risk_score via multiplicative fusion: structural_risk * complexity * churn * bus_factor_penalty. Dormant files (zero changes) get risk_score = 0.

Codebase: File scores and global metrics (modularity, wiring quality, architecture health) produce codebase_health (internal 0-1, displayed as 1-10).

Focus point: The "START HERE" recommendation ranks files by risk * impact * tractability * confidence to identify the single most actionable file.

Optional Dependencies

pip install shannon-codebase-insight[serve]      # Dashboard (starlette, uvicorn, watchfiles)
pip install shannon-codebase-insight[tensordb]    # Parquet export + SQL finders (pyarrow, duckdb)
pip install shannon-codebase-insight[parsing]     # Tree-sitter parsing (more accurate AST)

Development

git clone https://github.com/namanagarwal/shannon-insight.git
cd shannon-insight
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

make test          # Run tests with coverage
make all           # Format + lint + type-check + test

License

MIT License -- see LICENSE

Credits

Created by Naman Agarwal. Built on Claude Shannon's information theory, PageRank (Page & Brin), Louvain community detection (Blondel et al.), Tarjan's SCC algorithm, Kolmogorov complexity approximation, Martin's package metrics, and Fiedler spectral analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shannon_codebase_insight-0.8.0.tar.gz (594.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shannon_codebase_insight-0.8.0-py3-none-any.whl (618.3 kB view details)

Uploaded Python 3

File details

Details for the file shannon_codebase_insight-0.8.0.tar.gz.

File metadata

  • Download URL: shannon_codebase_insight-0.8.0.tar.gz
  • Upload date:
  • Size: 594.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for shannon_codebase_insight-0.8.0.tar.gz
Algorithm Hash digest
SHA256 94ec3440e022b40c5a773c3bcf118c3b523ec403d790ce49ba95891396905faa
MD5 4dd8d44cf688c34bb3ed59837c9df55e
BLAKE2b-256 c0149f4c6cfca5d5f801e762b9f4fefed97de428f3e7686262bab5114b8ed5f3

See more details on using hashes here.

File details

Details for the file shannon_codebase_insight-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for shannon_codebase_insight-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b494871958187e87b48cdf34b2100bc7629cb060d339896f14664a831b4a19a
MD5 cb63a3b4d988db0c7c4ba5b1224620fe
BLAKE2b-256 bafb59fece9927f27b6397c1c97ffaa5251236c4ce04e41682dff35776b26a70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page