Skip to main content

TEst PYramid Doctor — diagnose a project's test pyramid: mass, structure, and coverage.

Project description

Tepyd — The TEst PYramid Doctor

Diagnose your test pyramid: is the shape what you say you want?

Tepyd looks at a project's test suite and tells you whether its shape matches the test pyramid you say you want: a broad base of cheap unit tests, fewer integration tests, a thin cap of end-to-end tests. It automates the checks you'd otherwise do by hand — which packages are under- or over-tested, where the cheap tests are missing, whether the test tree mirrors the source tree, and — by running your suite under coverage — which tier actually exercises each package.

It's configuration-driven: point it at any project, describe that project's layout once in pyproject.toml, and run one command. The author's own layout ships as the default, so for projects that share it there's nothing to configure.

Think tepyd doctor: diagnose my pyramid.

The lenses

Tepyd looks at a suite through several complementary lenses. Each is useful alone; together they catch failure modes the others miss.

Lens Command Question Runs tests? Status
Mass tepyd mass How much test code is there, and what shape does it make? no ✅ implemented
Mirror tepyd mirror Does the test tree structurally parallel the source tree? no ✅ implemented
Cover tepyd cover Which tier actually executes each unit — and is it the cheap one? yes ✅ implemented
Report tepyd report All the checks at once, plus advice: the why and the how, not just the what. no ✅ implemented
Audit tepyd audit Does each test file's imports match the tier it lives in? no 🔜 planned

Requirements

  • Python ≥ 3.12.
  • mass, mirror, and report have no runtime dependencies — the line counter is built in.
  • cover additionally needs the analysed project's own pytest and coverage to be importable, so run it from that project's environment (see tepyd cover).
  • cloc is an optional opt-in for the line counter (counter = "cloc").

Install

uv sync                                 # for development in this repo
# once published to PyPI (not yet):
# uv tool install tepyd

After uv sync, prefix commands with uv run (or activate the venv). Once published, uv tool install tepyd puts tepyd on your PATH directly.

The default line counter is built in — no external dependency. Set counter = "cloc" in config to use the cloc binary instead, if you want its stricter, multi-language counting.

Quick start

uv run tepyd init                      # detect this project's layout, write a config
uv run tepyd mass --min-src 1          # analyse the current directory
uv run tepyd -C /path/to/project mass  # analyse another project
uv run tepyd report                    # the checks + advice, in one read

(--min-src skips source units under 20 LOC by default; pass --min-src 1 on small projects to see every unit. Drop the uv run prefix once Tepyd is on your PATH.)

mass, mirror, and cover take --json for machine-readable output — the contract report and a future CI gate build on (report itself renders text or Markdown via --format). All commands take -C/--root DIR to point at a project other than the current directory.

Concepts

  • Source unit — a meaningful slice of the source tree that Tepyd analyses as a whole, derived from configurable glob patterns (by default: each package under modules/, plus every other top-level package).

  • Tier — one rung of the pyramid: a directory of tests of a given cost (e.g. tests/a_unit). Tiers are listed cheapest-first in config; any number is allowed, not just three.

  • Unit share — the fraction of a unit's test code that lives in the cheapest tier. The headline pyramid-health metric.

  • Shape glyph — a one-character read on a unit's pyramid:

    Glyph Meaning
    healthy — the cheapest tier is the largest and ≥ 40 % of test LOC
    balanced — neither clearly healthy nor inverted
    inverted — the most expensive tier is the largest and unit share < 30 %
    · no tests at all

    When a tier declares an expects scope (see tepyd mirror), the shape becomes layer-aware: a unit is judged against its expected home tier, not the unit ideal. A controller whose home is e2e is when its tests live at e2e — only marks test mass that sits above where it belongs.

tepyd mass

Counts test LOC against source LOC for every source unit and reports the per-tier breakdown, ratios, and pyramid shape.

tepyd mass
tepyd mass --json
tepyd mass --min-src 50      # skip source units under 50 LOC (default: 20)
tepyd mass --exclude faker   # skip a unit, on top of config exclusions (repeatable)
package       src  unit  integration  http-e2e  browser  tests  ratio    u%
------------  ---  ----  -----------  --------  -------  -----  -----  ----  -
modules/biz    24    32            8         4        0     44  1.83x   73%  ▲
modules/wire   24     0            0         0       24     24  1.00x    0%  ▼
services       16     0           16         0        0     16  1.00x    0%  ◇
models         16     0            0         0        0      0  0.00x     —  ·

=== Summary ===
...                            (per-tier totals and outlier lists)
Tier mix across the codebase : unit 50%  /  integration 20%  /  http-e2e 10%  /  browser 20%
  ⚠ unit share is 50%, below the 60% target — the pyramid is flattened.
Caveat : tier is by directory, not by what each test actually exercises.

How to read it. Each tier has its own column; ratio is total test LOC ÷ source LOC; u% is unit share. The summary lists outliers (untested, under-tested below 0.5×, heavily-tested above 2×), flags inverted pyramids, and warns when the codebase-wide unit share falls below the first tier's target_share.

Layer-aware judging. If you've scoped your tiers with expects, mass respects it: a unit whose home is e2e (a web controller) isn't flagged for being e2e-heavy, and the codebase target_share check sets aside each unit's tests at its expected higher tiers before measuring the unit share — so a large, legitimate e2e or integration surface doesn't read as a flattened pyramid. Only test mass sitting above where it belongs counts against you. Without expects, nothing changes — every unit is judged against the classic unit-pyramid.

A caveat Tepyd states up front: LOC is a proxy for effort, not a measure of quality, and a test's tier is decided by its directory, not by what it actually exercises. Mass tells you where to look; the cover lens tells you whether the tests are real.

tepyd mirror

Static, no-execution comparison of the test tree against the source tree, per tier, at the same granularity as your units — the slices mass and cover use. A package unit's mirrored test directory is checked recursively, so a test for any of its sub-packages counts. Refine the units patterns (e.g. ["modules/*", "*"]) to make mirror coarser or finer; it never floods a deeply-nested project with per-sub-package gaps.

tepyd mirror
tepyd mirror --json
tepyd mirror --exclude faker
Mirror — test tree vs source tree (6 source units)

unit: 3/6 mirrored (50%)
  gap     domain/models
  gap     web
  orphan  tests/a_unit/legacy

integration: 1/6 mirrored (17%)
  gap     domain
  ...

How to read it. For each tier:

  • present — the source package has a matching test directory that contains tests (counted in the X/Y mirrored figure),
  • gap — a source package with no test counterpart at this tier that the tier was expected to test,
  • out of scope — a source package the tier isn't responsible for, so its absence is reported as n/a, not a gap (shown as a count; the full list is in --json),
  • orphan — a test directory that contains tests but has no source on disk (tests for code that moved or vanished).

Scoping a tier to its layer. By default every tier is checked against every source package — fine for a flat app, noisy for a layered one. In a hexagonal/onion architecture each layer has a natural tier: pure domain is unit-tested, the persistence layer integration-tested, the HTTP edge end-to-end-tested. Checking all three against every package turns real structure into a wall of "gaps" that are correct by design:

unit: 4/10 mirrored (40%)
  gap     di
  gap     domain/ports
  gap     infrastructure
  gap     repositories
  gap     web
  gap     web/controllers

…and the integration and e2e tiers each report eight more gaps in the same vein — code that simply lives at a different tier. Tell Tepyd which packages each tier owns with expects (glob patterns over unit names):

[[tool.tepyd.tiers]]
name = "a_unit"
expects = ["domain", "domain/*", "services", "lib"]   # pure logic

[[tool.tepyd.tiers]]
name = "b_integration"
expects = ["repositories", "infrastructure"]          # the DB-bound layer

[[tool.tepyd.tiers]]
name = "c_e2e"
expects = ["web", "web/*"]                            # the HTTP edge

Now a package outside a tier's scope is reported as out of scope (n/a), not a gap, and drops out of that tier's X/Y mirrored figure — leaving only the gaps that are real:

unit: 4/4 mirrored (100%), 4 out of scope
integration: 2/2 mirrored (100%), 6 out of scope
e2e: 2/2 mirrored (100%), 6 out of scope

Two things scope deliberately doesn't do. A package that should never be tested anywhere — a pure Protocol/ports layer with no runtime behaviour — belongs in [tool.tepyd.exclude] (with a reason), which drops it from every tier; that's why domain/ports and di are gone from the counts above. And tests that do exist always count as present, even at a tier that didn't expect them: scope governs whether an absence is a gap, never whether existing tests count.

Orphan detection checks whether the source actually exists, so it's independent of exclusions. Mirror coverage is presented as data, not a pass/fail — a browser tier showing 1/20 mirrored is often by design.

tepyd cover

The only lens that runs your test suite. For each tier it does one coverage run -m pytest <tier-dir> (the tiers partition the suite, so the cost is roughly one full run), then measures, per source unit, what fraction of its statements each tier actually executes.

Run cover from your project's own environment. Unlike the other lenses, it imports and executes your code, so it must run where your package, pytest, and coverage are installed. Add Tepyd there (uv add --dev tepyd) and run uv run tepyd cover from the project root. Installing Tepyd standalone (uv tool install) and pointing it at the project with -C will fail to import your tests.

tepyd cover                  # all tiers
tepyd cover --tier a_unit    # just the unit tier (repeatable)
tepyd cover --json
unit      stmts  unit  e2e   any
--------  -----  ----  ---  ----
checkout      7    0%  88%   88%  v hidden
domain        9  100%   0%  100%

The any column is the union across tiers — a unit's true reachable-by-tests coverage. The flag marks the hidden inverted pyramid: a unit that's well covered overall (any high) but barely by its unit tier — the lines run, but only the expensive tiers run them. A global coverage report would show both rows as green; only this lens reveals that checkout's coverage is entirely e2e.

Requirements & behavior:

  • It runs your suite under the same interpreter that runs tepyd, so that interpreter must have your project's dependencies (and pytest/coverage). Run it from your project's environment — uv run tepyd cover, or uv run --with[-editable] <tepyd> tepyd cover if tepyd lives elsewhere. If your project's code can't be imported (every tier fails in conftest/collection), cover says so, names the interpreter and the missing module, and points you at the fix — rather than printing a wall of zeros.
  • It prints per-tier progress to stderr as it runs (it executes the whole suite once per tier, so a large suite takes a while — the progress lines tell you it's working, not hung).
  • It ignores the project's own [tool.coverage] config so the numbers don't depend on it, and attributes coverage by resolved path (robust to multi-file units and absolute coverage paths).
  • A tier that fails to run is shown as a 0% column and listed as not-measured (distinct from "0% because untested"); a tier whose tests ran but failed is used with a warning that the numbers are a floor. Both also print to stderr.
  • A tier that runs but measures 0% everywhere — typically a browser/Playwright suite that drives your app in a separate process coverage.py can't see — is flagged as not measured, not covers nothing (and listed in --json as blind_tiers). Scope it out with --tier, or measure it under subprocess coverage.

tepyd report

Runs every check at once and synthesises the results into a list of findings, each carrying not just what is wrong but why it matters (the pyramid principle behind it) and how to fix it.

tepyd report                     # console report, "senior" level
tepyd report --format md         # Markdown, for a PR comment or a committed file
tepyd report --level newb        # teach the concepts (intro + glossary + advice)
tepyd report --level expert      # a terse one-line-per-finding checklist

Every report opens with a context lead-in — what's being measured and why — and the --level knob tunes how much it explains, not which problems it finds:

Level What you get
newb A full plain-language explanation of the pyramid, every finding's what/why/fix, general advice, and a glossary.
junior A shorter context, every finding's what/why/fix, and general advice.
senior (default) A brief context, then each finding's what/why/fix — no hand-holding.
expert A one-line context note, then one line per finding: the marker, the title, and the action.

It reports an overall health verdict (healthy / fair / needs work), a one-line summary of each lens, and findings ordered by severity ( problem, warning, info). --min-src and --exclude work as they do on the individual lenses.

tepyd init

Rather than write the config by hand, let Tepyd guess it. tepyd init looks around the project for the usual clues — a src/ package (or a flat top-level one), a tests/ tree split into tiers, a modules/ sub-layout, a top-level browser-test root — and appends a commented [tool.tepyd] block to pyproject.toml, leaving the rest of the file untouched.

tepyd init             # detect and write the section
tepyd init --dry-run   # print what it would write, change nothing

When it can't make a confident guess — for example, several packages under src/ with none named app or matching the project name — it asks you to choose (when run interactively). In a non-interactive context (a pipe, CI), it falls back to the first candidate and prints a note rather than blocking. It also refuses to overwrite an existing [tool.tepyd] section, and prints a note for anything else it had to guess (an undetected source root, no test tiers). The result is a head start to review — not necessarily a finished config.

Configuration

tepyd init writes this for you, but here is the full reference. Everything is configured under [tool.tepyd] in pyproject.toml. With no section at all, the defaults below apply (except exclude, which is always empty unless set — exclusions are per-project policy, not a layout default).

[tool.tepyd]
src_root    = "src/app"   # filesystem root of the analysed source
src_package = "app"       # dotted import path; reserved — not read by any lens yet

# How to slice the source tree into units (globs, first-match-wins): explode
# each package under modules/ one level deep, then take every other
# top-level package as a unit.
units = ["modules/*", "*"]

# Line counter: "internal" (built-in, no dependency) or "cloc".
counter = "internal"

# Source units excluded from analysis — a reason is REQUIRED, so the policy
# decision is documented where it's made.
[tool.tepyd.exclude]
faker = "seed/fake-data generator, exercised via fixtures"

# Test tiers, cheapest first (bottom of the pyramid first). Any number of tiers.
[[tool.tepyd.tiers]]
name = "a_unit"
root = "tests/a_unit"
label = "unit"
target_share = 0.60          # policy gate: ≥ 60 % of test LOC should be here

[[tool.tepyd.tiers]]
name = "b_integration"
root = "tests/b_integration"
label = "integration"
# Optional: scope this tier to the units it should test (globs over unit
# names). Units outside the scope are reported n/a, not as gaps. Omit it
# and the tier is checked against every unit.
expects = ["repositories/*", "infrastructure/*"]

[[tool.tepyd.tiers]]
name = "c_e2e"
root = "tests/c_e2e"
label = "http-e2e"

[[tool.tepyd.tiers]]
name = "e2e_playwright"
root = "e2e_playwright"      # an arbitrary root — needn't live under tests/
label = "browser"
strip_prefix = "modules/"    # this tier flattens the layout: modules/biz → biz

Reference

Key Default Meaning
src_root "src/app" Filesystem root of the analysed source.
src_package "app" Dotted import path of the source. Currently informational — no lens reads it yet (cover keys off src_root).
units ["modules/*", "*"] Glob patterns slicing the source tree into units; first match wins, and a container whose children were already claimed is skipped. A pattern ending in .py (e.g. ["*.py"]) makes each top-level module a unit — for flat packages with no sub-packages; ["*"] stays directories-only.
counter "internal" internal (tokenize-based; counts non-blank, non-comment lines) or cloc.
exclude {} Table of unit = "reason"; the reason is required.
tiers four tiers (see above) Array of tables, cheapest-first.

Per-tier keys: name (required), root (required), label (defaults to name), target_share (0–1, optional policy gate), strip_prefix (mapping rewrite for tiers that flatten the layout), expects (glob patterns scoping the tier to the units it should test; unset means every unit).

A different project just describes itself — e.g. a flat src/ with two tiers:

[tool.tepyd]
src_root = "src"
src_package = "mypkg"
units = ["*"]

[[tool.tepyd.tiers]]
name = "unit"
root = "tests/unit"

[[tool.tepyd.tiers]]
name = "e2e"
root = "tests/e2e"

What Tepyd is not

  • Not a test runner, and not a replacement for pytest or coverage — cover orchestrates them.
  • Not a correctness checker. LOC is a proxy for effort, and a test's tier is decided by its directory, not by what it exercises.
  • Not a pass/fail gate (today): it reports data and advice. The mirror and cover figures are diagnostics, not targets — 1/20 mirrored for a browser tier is often correct by design.

Development

make test    # pytest
make lint    # ruff + ty + pyrefly + mypy
make format  # ruff format + autofix

Tests are themselves organised as a pyramid (tests/a_unit, tests/b_integration, tests/c_e2e) — Tepyd eats its own dog food.

Changelog

See CHANGES.md.

License

Tepyd is licensed under the Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tepyd-0.6.1.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tepyd-0.6.1-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file tepyd-0.6.1.tar.gz.

File metadata

  • Download URL: tepyd-0.6.1.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tepyd-0.6.1.tar.gz
Algorithm Hash digest
SHA256 8154a24da01eaafa8fafdae43566ea3ed03420c49c7c716da386093e982334b5
MD5 d6930a99d3a1ef3a9bef757c474487d6
BLAKE2b-256 dddb6b03a7793e022972a7173dbfa4125982c644706174f087a945c7b084009b

See more details on using hashes here.

File details

Details for the file tepyd-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: tepyd-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 59.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tepyd-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb1f51d0f442fc9636c63c820a1def406cc4f3913f2354ea82eb556dc349678d
MD5 b41352ee2d610cbd041691fa4b1e4b93
BLAKE2b-256 a1ba6e6a398b8c0ae7b9820cf16a4c7934ab6567199b594303fea75ad5c95ac4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page