Structural codebase analysis — no parsers, no config, any language
Project description
quale
Structural codebase analysis - no parsers, no config, any language.
Quickstart
pip install quale
cd my-project
quale review # per-file review summary
quale ci check origin/main HEAD # automated CI gates
quale agent guard src/route.ts # risk packet for LLM agents
Commands by persona
Commands are organized into four namespaces:
| Persona | Prefix | Commands |
|---|---|---|
| Human developer | quale |
review, onboard, refactor-cost, inspect, explore |
| LLM agent | quale agent |
orient (repo map), edit (edit context), guard (risk packet) |
| CI pipeline | quale ci |
check, comment, trend, init (GitHub Actions generator) |
| Structural primitives | quale core |
60+ commands including hub-risk, spectral-gap, criticality |
Human developer
| Command | What it does |
|---|---|
quale review |
Per-file review: stable anchors, hub risk, test gaps, action items |
quale onboard |
Onboarding plan: languages, macro-modules, landmark files |
quale refactor-cost <file> |
Effort estimate: direct impact, transitive ripple, clones |
quale inspect . |
Codebase overview: tech stack, module layout, health |
quale explore . |
Best files to read first for a new contributor |
LLM agent
Agent commands return structured JSON - no terminal output to parse:
| Command | What it returns |
|---|---|
quale agent orient |
Repo map: modules, landmarks, languages, recommended workflow |
quale agent edit <file> |
Edit context + verification_mc multi-choice candidates |
quale agent guard <file> |
Risk packet: guide, hub risk, complexity, stable anchors |
Agents are onboarded through agent orient, which returns enough structural
context to avoid wrong-file-path and wrong-test-file mistakes.
Measured effect on a deepseek-v4-flash agent (1,100 trials, 12 repos):
baseline test-file accuracy 10-20%, edit-context --format tool raises it to
75% with zero extra edits. Across 6 models tested (Qwen, Gemma, Nemotron,
Mistral, Claude, local Gemma), every model guessed the wrong test file
without quale and found the right one with it.
CI pipeline
| Command | What it does |
|---|---|
quale ci init |
Generates a GitHub Actions YAML |
quale ci check <base> <head> |
Runs structural gates, exits 0-7 with bitmask |
quale ci comment <base> <head> |
Posts structural report as GitHub PR comment |
quale ci trend |
Tracks CI metric trends over time |
Advanced primitives
See quale core --help for 60+ commands including hub-risk, spectral-gap,
criticality, coupling-chain, diff-structural, test-gaps, and more.
How it works
flowchart LR
A[Source files] --> B[Vocabulary extraction]
B --> C[Co-occurrence matrix]
C --> D[Structural analysis]
D --> E[Human output]
D --> F[CI gates]
D --> G[Agent JSON]
Quale reads every source file as text and builds a vocabulary for each one.
Words and identifiers are extracted by splitting on delimiters (. _ -
/ CamelCase - no AST or parser needed). Stopwords, imports, and keywords
are stripped.
These per-file vocabularies are assembled into a sparse co-occurrence matrix:
if two files both contain the identifier createUser, they share an edge.
The matrix captures vocabulary overlap relationships: which files speak the
same "language" - without parsing imports, ASTs, or data flow. This naturally
reveals module alignment, test coverage gaps, and files that act as vocabulary
hubs.
The same delimiter-splitting pipeline works without modification across languages - there is no grammar file, no AST plugin, no language-specific config. Quale treats every source file as text, so it handles any language the same way. The quality of the output depends on the codebase having enough identifiers to build a meaningful matrix.
What the matrix reveals
| Metric | What it measures | Why it matters |
|---|---|---|
| Hub risk | Files coupled to many others but rarely edited | Changes to these files break many dependents; they need careful review |
| Spectral gap | Size ratio of largest vs second-largest vocabulary cluster | A gap > 3x often points to a monolith - one module's vocabulary dominates the repo |
| Test mirror | Structural overlap between source and test files | Low overlap suggests tests don't exercise the source vocabulary directly |
| Criticality (k) | Change amplification factor | k > 1 means changes cascade - touching one file affects many through shared vocabulary |
| Entropy | Directory-level vocabulary dispersion | High-entropy directories use identifiers inconsistently across files |
| Coupling chain | N-hop transitive file coupling | The indirect blast radius - changing A may break C through B |
| Stable core | Files whose vocabulary is stable across git history | Low-risk refactoring targets |
| Clone detection | Near-identical identifier sets across files | Candidates for deduplication |
flowchart LR
A[Co-occurrence matrix] --> B[Hub risk]
A --> C[Spectral gap]
A --> D[Test mirror ratio]
A --> E[Criticality k]
A --> F[Coupling chains]
B --> G[quale review / agent guard]
C --> G
D --> G
E --> G
F --> G
G --> H[Terminal report or structured JSON]
What it is and what it's not
What it is:
- A structural vocabulary analyzer for codebases
- A code review tool that surfaces coupling, test gaps, and stable anchors
- A CI gate that checks for structural regressions
- An LLM agent helper that provides repo context in structured JSON
What it's not:
- Not a linter (no AST, no rule engine, no style checking)
- Not a test coverage tool (vocabulary overlap ≠ statement coverage)
- Not a security scanner (no data flow, no taint analysis)
- Not a dependency graph (import paths are never parsed - co-occurrence is inferred from identifier sharing, which is different)
- Not useful on a brand-new repo with fewer than ~50 files - there's no structure to measure
- Not a replacement for human code review - it catches structural blind spots, not logic bugs
Practical limits
githistory required for diff-based commands- 75% verification accuracy on test-file prediction - the remaining 25% are repos without stem-matched tests or co-change history. When quale can't find the right file, it says so rather than guessing.
Development
git clone https://github.com/Reliary/quale
cd quale
pip install -e ".[dev]"
python -m pytest tests/ -v
ruff check quale/
mypy quale/ --ignore-missing-imports
Deep dive
- docs/ALGORITHM.md - vocabulary extraction and co-occurrence data flow
- docs/COMMANDS.md - full command reference
- docs/CI_INTEGRATION.md - CI setup guide
- docs/EFFECT_HARNESS.md - methodology and results
- CHANGELOG.md - release history
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quale-0.9.3.tar.gz.
File metadata
- Download URL: quale-0.9.3.tar.gz
- Upload date:
- Size: 203.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e599417b14b5152b42a5b93f858d15263fe408e8290a50b1558e4da64054e96
|
|
| MD5 |
2716569b3cb40d6b2926eaddab137ad7
|
|
| BLAKE2b-256 |
b1e66a725396b22743d6962d049f8a6e997684ff2f843aafea6c743b079787b1
|
File details
Details for the file quale-0.9.3-py3-none-any.whl.
File metadata
- Download URL: quale-0.9.3-py3-none-any.whl
- Upload date:
- Size: 174.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
841f39815f312b24664e2c8f6b2e0ef20807eb004943b36ebd65e4a47c6974b6
|
|
| MD5 |
4549263cdcd76356cbf3dfc90968fa53
|
|
| BLAKE2b-256 |
7659919eea34cb879adc79258da65e782d321bb54e6bc50311de22ae4f48ac79
|