TEst PYramid Doctor — diagnose a project's test pyramid: mass, structure, and coverage.
Project description
Tepyd — The TEst PYramid Doctor
Diagnose your test pyramid: is the shape what you say you want?
Tepyd looks at a project's test suite and tells you whether its shape matches the test pyramid you say you want: a broad base of cheap unit tests, fewer integration tests, a thin cap of end-to-end tests. It automates the checks you'd otherwise do by hand — which packages are under- or over-tested, where the cheap tests are missing, whether the test tree mirrors the source tree, and — by running your suite under coverage — which tier actually exercises each package.
It's configuration-driven: point it at any project, describe that project's layout once in pyproject.toml, and run one command. The author's own layout ships as the default, so for projects that share it there's nothing to configure.
Think
tepyd doctor: diagnose my pyramid.
The lenses
Tepyd looks at a suite through several complementary lenses. Each is useful alone; together they catch failure modes the others miss.
| Lens | Command | Question | Runs tests? | Status |
|---|---|---|---|---|
| Mass | tepyd mass |
How much test code is there, and what shape does it make? | no | ✅ implemented |
| Mirror | tepyd mirror |
Does the test tree structurally parallel the source tree? | no | ✅ implemented |
| Cover | tepyd cover |
Which tier actually executes each unit — and is it the cheap one? | yes | ✅ implemented |
| Report | tepyd report |
All the checks at once, plus advice: the why and the how, not just the what. | no | ✅ implemented |
| Audit | tepyd audit |
Does each test file's imports match the tier it lives in? | no | 🔜 planned |
Requirements
- Python ≥ 3.12.
mass,mirror, andreporthave no runtime dependencies — the line counter is built in.coveradditionally needs the analysed project's ownpytestandcoverageto be importable, so run it from that project's environment (seetepyd cover).clocis an optional opt-in for the line counter (counter = "cloc").
Install
uv sync # for development in this repo
# once published to PyPI (not yet):
# uv tool install tepyd
After uv sync, prefix commands with uv run (or activate the venv). Once published, uv tool install tepyd puts tepyd on your PATH directly.
The default line counter is built in — no external dependency. Set counter = "cloc" in config to use the cloc binary instead, if you want its stricter, multi-language counting.
Quick start
uv run tepyd init # detect this project's layout, write a config
uv run tepyd mass --min-src 1 # analyse the current directory
uv run tepyd -C /path/to/project mass # analyse another project
uv run tepyd report # the checks + advice, in one read
(--min-src skips source units under 20 LOC by default; pass --min-src 1 on small projects to see every unit. Drop the uv run prefix once Tepyd is on your PATH.)
mass, mirror, and cover take --json for machine-readable output — the contract report and a future CI gate build on (report itself renders text or Markdown via --format). All commands take -C/--root DIR to point at a project other than the current directory.
Concepts
-
Source unit — a meaningful slice of the source tree that Tepyd analyses as a whole, derived from configurable glob patterns (by default: each package under
modules/, plus every other top-level package). -
Tier — one rung of the pyramid: a directory of tests of a given cost (e.g.
tests/a_unit). Tiers are listed cheapest-first in config; any number is allowed, not just three. -
Unit share — the fraction of a unit's test code that lives in the cheapest tier. The headline pyramid-health metric.
-
Shape glyph — a one-character read on a unit's pyramid:
Glyph Meaning ▲healthy — the cheapest tier is the largest and ≥ 40 % of test LOC ◇balanced — neither clearly healthy nor inverted ▼inverted — the most expensive tier is the largest and unit share < 30 % ·no tests at all When a tier declares an
expectsscope (seetepyd mirror), the shape becomes layer-aware: a unit is judged against its expected home tier, not the unit ideal. A controller whose home is e2e is▲when its tests live at e2e —▼only marks test mass that sits above where it belongs.
tepyd mass
Counts test LOC against source LOC for every source unit and reports the per-tier breakdown, ratios, and pyramid shape.
tepyd mass
tepyd mass --json
tepyd mass --min-src 50 # skip source units under 50 LOC (default: 20)
tepyd mass --exclude faker # skip a unit, on top of config exclusions (repeatable)
package src unit integration http-e2e browser tests ratio u%
------------ --- ---- ----------- -------- ------- ----- ----- ---- -
modules/biz 24 32 8 4 0 44 1.83x 73% ▲
modules/wire 24 0 0 0 24 24 1.00x 0% ▼
services 16 0 16 0 0 16 1.00x 0% ◇
models 16 0 0 0 0 0 0.00x — ·
=== Summary ===
... (per-tier totals and outlier lists)
Tier mix across the codebase : unit 50% / integration 20% / http-e2e 10% / browser 20%
⚠ unit share is 50%, below the 60% target — the pyramid is flattened.
Caveat : tier is by directory, not by what each test actually exercises.
How to read it. Each tier has its own column; ratio is total test LOC ÷ source LOC; u% is unit share. The summary lists outliers (untested, under-tested below 0.5×, heavily-tested above 2×), flags inverted pyramids, and warns when the codebase-wide unit share falls below the first tier's target_share.
Layer-aware judging. If you've scoped your tiers with expects, mass respects it: a unit whose home is e2e (a web controller) isn't flagged ▼ for being e2e-heavy, and the codebase target_share check sets aside each unit's tests at its expected higher tiers before measuring the unit share — so a large, legitimate e2e or integration surface doesn't read as a flattened pyramid. Only test mass sitting above where it belongs counts against you. Without expects, nothing changes — every unit is judged against the classic unit-pyramid.
A caveat Tepyd states up front: LOC is a proxy for effort, not a measure of quality, and a test's tier is decided by its directory, not by what it actually exercises. Mass tells you where to look; the cover lens tells you whether the tests are real.
tepyd mirror
Static, no-execution comparison of the test tree against the source tree, per tier, at the same granularity as your units — the slices mass and cover use. A package unit's mirrored test directory is checked recursively, so a test for any of its sub-packages counts. Refine the units patterns (e.g. ["modules/*", "*"]) to make mirror coarser or finer; it never floods a deeply-nested project with per-sub-package gaps.
tepyd mirror
tepyd mirror --json
tepyd mirror --exclude faker
Mirror — test tree vs source tree (6 source units)
unit: 3/6 mirrored (50%)
gap domain/models
gap web
orphan tests/a_unit/legacy
integration: 1/6 mirrored (17%)
gap domain
...
How to read it. For each tier:
- present — the source package has a matching test directory that contains tests (counted in the
X/Y mirroredfigure), - gap — a source package with no test counterpart at this tier that the tier was expected to test,
- out of scope — a source package the tier isn't responsible for, so its absence is reported as n/a, not a gap (shown as a count; the full list is in
--json), - orphan — a test directory that contains tests but has no source on disk (tests for code that moved or vanished).
Scoping a tier to its layer. By default every tier is checked against every source package — fine for a flat app, noisy for a layered one. In a hexagonal/onion architecture each layer has a natural tier: pure domain is unit-tested, the persistence layer integration-tested, the HTTP edge end-to-end-tested. Checking all three against every package turns real structure into a wall of "gaps" that are correct by design:
unit: 4/10 mirrored (40%)
gap di
gap domain/ports
gap infrastructure
gap repositories
gap web
gap web/controllers
…and the integration and e2e tiers each report eight more gaps in the same vein — code that simply lives at a different tier. Tell Tepyd which packages each tier owns with expects (glob patterns over unit names):
[[tool.tepyd.tiers]]
name = "a_unit"
expects = ["domain", "domain/*", "services", "lib"] # pure logic
[[tool.tepyd.tiers]]
name = "b_integration"
expects = ["repositories", "infrastructure"] # the DB-bound layer
[[tool.tepyd.tiers]]
name = "c_e2e"
expects = ["web", "web/*"] # the HTTP edge
Now a package outside a tier's scope is reported as out of scope (n/a), not a gap, and drops out of that tier's X/Y mirrored figure — leaving only the gaps that are real:
unit: 4/4 mirrored (100%), 4 out of scope
integration: 2/2 mirrored (100%), 6 out of scope
e2e: 2/2 mirrored (100%), 6 out of scope
Two things scope deliberately doesn't do. A package that should never be tested anywhere — a pure Protocol/ports layer with no runtime behaviour — belongs in [tool.tepyd.exclude] (with a reason), which drops it from every tier; that's why domain/ports and di are gone from the counts above. And tests that do exist always count as present, even at a tier that didn't expect them: scope governs whether an absence is a gap, never whether existing tests count.
Orphan detection checks whether the source actually exists, so it's independent of exclusions. Mirror coverage is presented as data, not a pass/fail — a browser tier showing 1/20 mirrored is often by design.
tepyd cover
The only lens that runs your test suite. For each tier it does one coverage run -m pytest <tier-dir> (the tiers partition the suite, so the cost is roughly one full run), then measures, per source unit, what fraction of its statements each tier actually executes.
Run
coverfrom your project's own environment. Unlike the other lenses, it imports and executes your code, so it must run where your package,pytest, andcoverageare installed. Add Tepyd there (uv add --dev tepyd) and runuv run tepyd coverfrom the project root. Installing Tepyd standalone (uv tool install) and pointing it at the project with-Cwill fail to import your tests.
tepyd cover # all tiers
tepyd cover --tier a_unit # just the unit tier (repeatable)
tepyd cover --json
unit stmts unit e2e any
-------- ----- ---- --- ----
checkout 7 0% 88% 88% v hidden
domain 9 100% 0% 100%
The any column is the union across tiers — a unit's true reachable-by-tests coverage. The flag marks the hidden inverted pyramid: a unit that's well covered overall (any high) but barely by its unit tier — the lines run, but only the expensive tiers run them. A global coverage report would show both rows as green; only this lens reveals that checkout's coverage is entirely e2e.
Requirements & behavior:
- It runs your suite under the same interpreter that runs tepyd, so that interpreter must have your project's dependencies (and
pytest/coverage). Run it from your project's environment —uv run tepyd cover, oruv run --with[-editable] <tepyd> tepyd coverif tepyd lives elsewhere. If your project's code can't be imported (every tier fails inconftest/collection), cover says so, names the interpreter and the missing module, and points you at the fix — rather than printing a wall of zeros. - It prints per-tier progress to stderr as it runs (it executes the whole suite once per tier, so a large suite takes a while — the progress lines tell you it's working, not hung).
- It ignores the project's own
[tool.coverage]config so the numbers don't depend on it, and attributes coverage by resolved path (robust to multi-file units and absolute coverage paths). - A tier that fails to run is shown as a 0% column and listed as not-measured (distinct from "0% because untested"); a tier whose tests ran but failed is used with a warning that the numbers are a floor. Both also print to stderr.
- A tier that runs but measures 0% everywhere — typically a browser/Playwright suite that drives your app in a separate process
coverage.pycan't see — is flagged as not measured, not covers nothing (and listed in--jsonasblind_tiers). Scope it out with--tier, or measure it under subprocess coverage.
tepyd report
Runs every check at once and synthesises the results into a list of findings, each carrying not just what is wrong but why it matters (the pyramid principle behind it) and how to fix it.
tepyd report # console report, "senior" level
tepyd report --format md # Markdown, for a PR comment or a committed file
tepyd report --level newb # teach the concepts (intro + glossary + advice)
tepyd report --level expert # a terse one-line-per-finding checklist
Every report opens with a context lead-in — what's being measured and why — and the --level knob tunes how much it explains, not which problems it finds:
| Level | What you get |
|---|---|
newb |
A full plain-language explanation of the pyramid, every finding's what/why/fix, general advice, and a glossary. |
junior |
A shorter context, every finding's what/why/fix, and general advice. |
senior (default) |
A brief context, then each finding's what/why/fix — no hand-holding. |
expert |
A one-line context note, then one line per finding: the marker, the title, and the action. |
It reports an overall health verdict (healthy / fair / needs work), a one-line summary of each lens, and findings ordered by severity (✗ problem, ⚠ warning, ℹ info). --min-src and --exclude work as they do on the individual lenses.
tepyd init
Rather than write the config by hand, let Tepyd guess it. tepyd init looks around the project for the usual clues — a src/ package (or a flat top-level one), a tests/ tree split into tiers, a modules/ sub-layout, a top-level browser-test root — and appends a commented [tool.tepyd] block to pyproject.toml, leaving the rest of the file untouched.
tepyd init # detect and write the section
tepyd init --dry-run # print what it would write, change nothing
When it can't make a confident guess — for example, several packages under src/ with none named app or matching the project name — it asks you to choose (when run interactively). In a non-interactive context (a pipe, CI), it falls back to the first candidate and prints a note rather than blocking. It also refuses to overwrite an existing [tool.tepyd] section, and prints a note for anything else it had to guess (an undetected source root, no test tiers). The result is a head start to review — not necessarily a finished config.
Configuration
tepyd init writes this for you, but here is the full reference. Everything is configured under [tool.tepyd] in pyproject.toml. With no section at all, the defaults below apply (except exclude, which is always empty unless set — exclusions are per-project policy, not a layout default).
[tool.tepyd]
src_root = "src/app" # filesystem root of the analysed source
src_package = "app" # dotted import path; reserved — not read by any lens yet
# How to slice the source tree into units (globs, first-match-wins): explode
# each package under modules/ one level deep, then take every other
# top-level package as a unit.
units = ["modules/*", "*"]
# Line counter: "internal" (built-in, no dependency) or "cloc".
counter = "internal"
# Source units excluded from analysis — a reason is REQUIRED, so the policy
# decision is documented where it's made.
[tool.tepyd.exclude]
faker = "seed/fake-data generator, exercised via fixtures"
# Test tiers, cheapest first (bottom of the pyramid first). Any number of tiers.
[[tool.tepyd.tiers]]
name = "a_unit"
root = "tests/a_unit"
label = "unit"
target_share = 0.60 # policy gate: ≥ 60 % of test LOC should be here
[[tool.tepyd.tiers]]
name = "b_integration"
root = "tests/b_integration"
label = "integration"
# Optional: scope this tier to the units it should test (globs over unit
# names). Units outside the scope are reported n/a, not as gaps. Omit it
# and the tier is checked against every unit.
expects = ["repositories/*", "infrastructure/*"]
[[tool.tepyd.tiers]]
name = "c_e2e"
root = "tests/c_e2e"
label = "http-e2e"
[[tool.tepyd.tiers]]
name = "e2e_playwright"
root = "e2e_playwright" # an arbitrary root — needn't live under tests/
label = "browser"
strip_prefix = "modules/" # this tier flattens the layout: modules/biz → biz
Reference
| Key | Default | Meaning |
|---|---|---|
src_root |
"src/app" |
Filesystem root of the analysed source. |
src_package |
"app" |
Dotted import path of the source. Currently informational — no lens reads it yet (cover keys off src_root). |
units |
["modules/*", "*"] |
Glob patterns slicing the source tree into units; first match wins, and a container whose children were already claimed is skipped. A pattern ending in .py (e.g. ["*.py"]) makes each top-level module a unit — for flat packages with no sub-packages; ["*"] stays directories-only. |
counter |
"internal" |
internal (tokenize-based; counts non-blank, non-comment lines) or cloc. |
exclude |
{} |
Table of unit = "reason"; the reason is required. |
tiers |
four tiers (see above) | Array of tables, cheapest-first. |
Per-tier keys: name (required), root (required), label (defaults to name), target_share (0–1, optional policy gate), strip_prefix (mapping rewrite for tiers that flatten the layout), expects (glob patterns scoping the tier to the units it should test; unset means every unit).
A different project just describes itself — e.g. a flat src/ with two tiers:
[tool.tepyd]
src_root = "src"
src_package = "mypkg"
units = ["*"]
[[tool.tepyd.tiers]]
name = "unit"
root = "tests/unit"
[[tool.tepyd.tiers]]
name = "e2e"
root = "tests/e2e"
What Tepyd is not
- Not a test runner, and not a replacement for pytest or coverage —
coverorchestrates them. - Not a correctness checker. LOC is a proxy for effort, and a test's tier is decided by its directory, not by what it exercises.
- Not a pass/fail gate (today): it reports data and advice. The mirror and cover figures are diagnostics, not targets —
1/20 mirroredfor a browser tier is often correct by design.
Development
make test # pytest
make lint # ruff + ty + pyrefly + mypy
make format # ruff format + autofix
Tests are themselves organised as a pyramid (tests/a_unit, tests/b_integration, tests/c_e2e) — Tepyd eats its own dog food.
Changelog
See CHANGES.md.
License
Tepyd is licensed under the Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tepyd-0.6.1.tar.gz.
File metadata
- Download URL: tepyd-0.6.1.tar.gz
- Upload date:
- Size: 58.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8154a24da01eaafa8fafdae43566ea3ed03420c49c7c716da386093e982334b5
|
|
| MD5 |
d6930a99d3a1ef3a9bef757c474487d6
|
|
| BLAKE2b-256 |
dddb6b03a7793e022972a7173dbfa4125982c644706174f087a945c7b084009b
|
File details
Details for the file tepyd-0.6.1-py3-none-any.whl.
File metadata
- Download URL: tepyd-0.6.1-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb1f51d0f442fc9636c63c820a1def406cc4f3913f2354ea82eb556dc349678d
|
|
| MD5 |
b41352ee2d610cbd041691fa4b1e4b93
|
|
| BLAKE2b-256 |
a1ba6e6a398b8c0ae7b9820cf16a4c7934ab6567199b594303fea75ad5c95ac4
|