Evolutionary Quality Metric for source code
Project description
Evolutionary Quality Metric (EQM)
EQM scores the functions in a git repository by how strongly they have been preserved under pressure to change. Code that survives many opportunities to be modified, especially when callers depended on it remaining stable, scores high. Code that churns frequently or is rarely referenced by the rest of the codebase scores low.
The metric is inspired by purifying selection in molecular evolution.
$$\text{EQM}(f) = \mathrm{LCB}_p!\left[\mathrm{Beta}(\alpha + k,; \beta + n - k)\right]$$
Each commit that touches a direct caller of f, or f itself, is a trial (n). If f did not change in that commit, it survived (k). α, β are Beta prior parameters (default 1, 1 — uniform). EQM is the lower credible bound of the resulting Beta posterior, penalising functions with few trials more aggressively than the posterior mean alone.
Quickstart
pip install eqm-score
# Analyze a repository (run once; subsequent commands are sub-second)
eqm analyze /path/to/your/repo
# Print per-function scores (JSONL, one object per function)
eqm score /path/to/your/repo
# Colorized heatmap in the terminal
eqm score /path/to/your/repo --format terminal
# Top 20 most-conserved functions
eqm top /path/to/your/repo --n 20
# Explain a single function's score
eqm explain /path/to/your/repo src/core/processor.py:42
# Debug what trials were counted for a function
eqm debug trials /path/to/your/repo my_module.MyClass.my_method
How scoring works
A trial for function f is triggered when:
- A direct caller of
fhad a nonsynonymous change in a commit (caller pressure), or fitself had a nonsynonymous change in a commit (direct mutation)
Both triggers in the same commit count as one trial. The trial is synonymous (survived) if f did not change; nonsynonymous (mutated) if f changed Changes first normalize the function's token sequence before comparing: local variable names, parameter names, string/integer literals, docstrings, and comments are all stripped or replaced with type tokens.
The following cases count as a direct caller:
- Same-file direct call
- From-import cross-file call
- Module-attribute cross-file call
- Intra-class self/cls
Some known limitations:
self.method()via inheritance or other instances- Star imports (
from module import * - Dynamic dispatch: calls through variables (
fn = get_fn(); fn()),getattr, or__call__ - Ambiguous global names
A function with many trials (high n) and a high survival rate gets a score near 1.0. A function with few trials gets a score near 0.5 regardless of its survival rate (maximally uncertain).
This library only supports Python at this time, though it's pretty easy to extend!
Concepts
Bernoulli Model
Each trial for function f is a Bernoulli event with unknown survival probability p. We place a Beta conjugate prior over p and observe k synonymous (survival) outcomes in n total trials:
$$\text{Prior:} \quad p \sim \mathrm{Beta}(\alpha, \beta)$$
$$\text{Posterior:} \quad p \mid k, n \sim \mathrm{Beta}(\alpha + k,; \beta + n - k)$$
Because Beta is conjugate to the Binomial, the posterior parameters update by simple arithmetic.
$$\mu = \frac{\alpha + k}{\alpha + \beta + n}$$
where:
α,β— Beta prior parameters (default 1, 1 — uniform; prior mean = 0.5)n— total trials forfk— synonymous trials (commits wherefsurvived unchanged)
| n | k | μ |
|---|---|---|
| 0 | — | 0.500 (no evidence) |
| 5 | 5 | 0.857 |
| 10 | 10 | 0.917 |
| 100 | 100 | 0.990 |
| 10 | 8 | 0.750 (mutated 2/10) |
EQM (lower credible bound)
$$\text{EQM}(f) = \mathrm{LCB}_p!\left[\mathrm{Beta}(\alpha + k,; \beta + n - k)\right]$$
EQM is the lower credible bound — the p-th quantile of the Beta posterior (default p = 0.05, i.e. the 95% one-sided LCB). This penalises functions with few trials relative to those with many, even if their observed survival rates are identical. As evidence accumulates, the LCB converges toward the true survival rate.
EQM is in (0, 1). A score near 1.0 means a function that rarely needed to change, either under caller pressure or on its own. A score near 0.5 means either no evidence yet (uncertain) or a function that tends to mutate frequently.
EQM does not measure correctness, readability or code style, or test coverage. High-EQM code can be buggy; it's just stable buggy code.
CLI Reference
eqm analyze
Build or update the lineage and reference databases for a repository.
eqm analyze REPO_PATH [OPTIONS]
Arguments:
REPO_PATH— path to the git repository to analyze.
Options:
| Option | Default | Description |
|---|---|---|
--ref |
HEAD |
Git ref to analyze |
--since DATE |
(all time) | Only process commits since this ISO date |
--lang LANGS |
python |
Comma-separated languages |
--cache PATH |
.eqm-cache.db |
SQLite cache path |
--workers N |
4 | Parallel workers |
--force |
false | Re-analyze already-processed commits |
Example:
eqm analyze . --lang python --since 2023-01-01
eqm score
Emit per-line EQM scores from the analysis cache.
eqm score REPO_PATH [OPTIONS]
Options:
| Option | Default | Description |
|---|---|---|
--file / -f |
(all) | Restrict to specific file(s) |
--format |
jsonl |
Output format: json, jsonl, terminal |
--threshold |
0.0 | Only emit lines with EQM ≥ threshold |
JSON output schema (per line):
{
"file": "src/foo/bar.py",
"line": 42,
"eqm": 0.917,
"components": {
"bayesian_survival": 0.917
},
"scope": {
"function": "process_batch",
"class": "BatchProcessor",
"module": "foo.bar"
},
"scope_uuid": "fn:7a3b9c..."
}
This schema is the contract between v1 and the future VS Code extension. It will be kept stable across minor versions.
eqm explain
Print the score breakdown for a single source line.
eqm explain REPO_PATH FILE:LINE
Example:
eqm explain . src/core/processor.py:42
Prints a JSON object with the full component breakdown, lineage stats, and incoming reference count.
eqm top
List the top-N highest-EQM functions or classes.
eqm top REPO_PATH [OPTIONS]
Options:
| Option | Default | Description |
|---|---|---|
--n |
50 | Number of entries to show |
--scope |
function |
Level: function, class, module |
eqm cache info
Show cache statistics: row counts, DB size, last analysis timestamp.
eqm cache info REPO_PATH
eqm cache clear
Wipe all cached analysis data.
eqm cache clear REPO_PATH [--yes]
eqm version
Print the installed EQM version.
eqm version
Configuration
EQM reads configuration from pyproject.toml ([tool.eqm]) or .eqm.toml at the repository root. CLI flags take precedence.
[tool.eqm]
languages = ["python"]
exclude = ["tests/", "vendor/", "**/*.generated.*"]
[tool.eqm.weights]
# Beta prior on BayesianSurvival — default Beta(1,1) = uniform prior
# New code (n=0) starts at prior_alpha / (prior_alpha + prior_beta) = 0.5
prior_alpha = 1.0
prior_beta = 1.0
# Lower Credible Bound z-score: EQM = posterior_mean - lcb_z * posterior_std
# 1.645 = one-sided 95% lower bound. Higher values penalise low-n nodes more.
lcb_z = 1.645
[tool.eqm.cache]
path = ".eqm-cache.db"
Development setup
# Requires: Python 3.11+, uv
git clone https://github.com/mskarlin/evolutionarily_quality_metric
cd eqm
uv sync --group dev
# Run the fast test suite
uv run pytest -m "not slow" -x
# Run including end-to-end tests
uv run pytest
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eqm_score-0.1.0.tar.gz.
File metadata
- Download URL: eqm_score-0.1.0.tar.gz
- Upload date:
- Size: 57.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1c3abdb08528d48817be44d29e8fe7c8bc6946a9d12faa1412e6c73d5c3183d
|
|
| MD5 |
a479f0c72890dd9e21e9906fc88c80d2
|
|
| BLAKE2b-256 |
54c0efd7be67fd66f56a1f2819ae8b4cfa0be24cc21e27d25ea3a44146b0b07a
|
Provenance
The following attestation bundles were made for eqm_score-0.1.0.tar.gz:
Publisher:
release.yml on mskarlin/evolutionarily_quality_metric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eqm_score-0.1.0.tar.gz -
Subject digest:
f1c3abdb08528d48817be44d29e8fe7c8bc6946a9d12faa1412e6c73d5c3183d - Sigstore transparency entry: 1685291835
- Sigstore integration time:
-
Permalink:
mskarlin/evolutionarily_quality_metric@075eba9acab22595f0e395e63c834d425f0e4e1a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mskarlin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@075eba9acab22595f0e395e63c834d425f0e4e1a -
Trigger Event:
push
-
Statement type:
File details
Details for the file eqm_score-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eqm_score-0.1.0-py3-none-any.whl
- Upload date:
- Size: 47.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75e4eae3bc4cb47b5c627bbcbfa5fc91bd43e3c1cadc70831219329849f862ed
|
|
| MD5 |
cb645331cbb5934b32e226963dbda89c
|
|
| BLAKE2b-256 |
07b47b71cd162c6ac330527e5700b4a71a83f88fa70d2de3b18606f6377da406
|
Provenance
The following attestation bundles were made for eqm_score-0.1.0-py3-none-any.whl:
Publisher:
release.yml on mskarlin/evolutionarily_quality_metric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eqm_score-0.1.0-py3-none-any.whl -
Subject digest:
75e4eae3bc4cb47b5c627bbcbfa5fc91bd43e3c1cadc70831219329849f862ed - Sigstore transparency entry: 1685291916
- Sigstore integration time:
-
Permalink:
mskarlin/evolutionarily_quality_metric@075eba9acab22595f0e395e63c834d425f0e4e1a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mskarlin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@075eba9acab22595f0e395e63c834d425f0e4e1a -
Trigger Event:
push
-
Statement type: