Skip to main content

Budget-aware, crash-survivable, resumable test supervisor that drives pytest from the outside. Federation-wide sibling of mgf-common under the mgf.* namespace.

Project description

mgf-test-supervisor

Budget-aware, crash-survivable, resumable test supervisor that drives pytest from the outside.

A sibling of mgf-common under the mgf.* namespace. Federation-standard test infrastructure (TS-22..TS-27).

Conformance: L2 per mgf-common/docs/standards/. Tracker: docs/inprogress/MGF_STANDARDS_CONFORMANCE.md. Runtime LG / CF / OB rules declined per the TS-23 stdlib-only constraint — full rationale in §3 of the tracker.

Why a supervisor (not a pytest plugin)

A pytest plugin lives inside pytest. If pytest crashes — interpreter deadlock, segfault in a C extension, a hook that infinite-loops — the plugin dies with it. The supervisor runs pytest as a subprocess. A pytest crash becomes just one more outcome to record.

What it gives you:

  • Hang detection. A per-test pytest-timeout (in-process) backed by an outer wall-clock SIGTERM → SIGKILL (out-of-process). Tests that genuinely hang are reported hung, not "still running."
  • Crash isolation. One chunk's segfault doesn't take the run with it. Subsequent chunks run. The crashed chunk's in-flight test is recorded crashed.
  • Budget-aware selection. Pass --budget 5m and the supervisor packs a greedy knapsack of tests under that wall-clock, biased to tier-0 + recently-failing tests.
  • Resumability. A run interrupted by SIGINT (or by your laptop sleeping) can be resumed with --resume. Already-completed tests aren't repeated.
  • Quarantine. Flaky tests auto-quarantine after 3-of-5 fails. They evict after 3 consecutive passes. No more "this test is flaky, ignore it" Slack pings.
  • Per-failure diagnostic bundles. Each failure writes one self-contained .txt under .gates/last-run/failures/ — pytest stdout/stderr, traceback, durations, environment fingerprint, daemon state (if configured). One file = one diagnostic.
  • JSONL outcomes. .gates/last-run/run.jsonl is append-only and parseable. Read via mgf.common.testing.TestRecord.from_jsonl_line(...) or any JSON-aware tool.

Install

uv add --dev mgf-test-supervisor   # once published to PyPI

Until PyPI publish, install from Codeberg:

uv add --dev "mgf-test-supervisor @ git+ssh://git@codeberg.org/magogi-admin/mgf-test-supervisor.git"

You also need pytest-timeout for the per-test timeout that backs the supervisor's outer wall-clock:

uv add --dev "pytest-timeout>=2.3,<3"

Adopting in your project

Three steps:

1. Install the dep (above) + pytest-timeout.

2. Add the pytest config to your pyproject.toml.

[tool.pytest.ini_options]
timeout = 30
timeout_method = "thread"
faulthandler_timeout = 60
markers = [
    "smoke: minimal post-install sanity (fast; no fixtures); tier-0",
    "e2e: end-to-end test exercising bootstrap + multiple subsystems; tier-3",
    "integration: cross-module integration test (slower; may launch real subprocesses)",
    "contract: wire-format + SDK-surface pins; tier-1",
    "property: invariant-under-all-inputs test (Hypothesis); tier-1",
    "regression: pin for a specific past bug; docstring cites the fixing commit; tier-1",
    "fuzz: random / generated / hostile input — asserts safety properties; tier-1",
    "concurrency: real-threads contention test; tier-2",
    "conformance: design-promise tests (G/S guarantees); release-gated; tier-2",
    "use_case: reproduces a consumer scenario from FEEDBACK paper or recipe doc; tier-2",
    "perf: performance pin — pins a hot path's per-call cost (PF-* + TS-14). tier-3",
    "stress: sustained high-load test; tier-3; nightly only",
    "slow: legacy marker for tests >1s wall-clock (TS-18). Deprecated. Removed in v2.0.",
]

3. Add .gates/ to .gitignore (the supervisor's run state lives there).

.gates/

4. Verify.

mgf-test-supervisor --self-check   # ~35s — canned pass/fail/hang/crash
mgf-test-supervisor --tier 0       # run your tier-0 smoke tests

If --self-check reports PASS, the supervisor + your project's config are wired correctly.

CLI

mgf-test-supervisor                       # run all tier 0-3 tests
mgf-test-supervisor --tier 0              # smoke only
mgf-test-supervisor --tier 0-2            # tiers 0, 1, 2
mgf-test-supervisor --budget 5m           # greedy knapsack under 5min
mgf-test-supervisor --resume              # resume the last interrupted run
mgf-test-supervisor --report              # show last run's summary.md
mgf-test-supervisor --quarantine          # show currently-quarantined tests
mgf-test-supervisor --self-check          # canned scenarios — verifies the supervisor itself
mgf-test-supervisor --fail-fast           # stop after first failing chunk

mtest is a short alias for the same binary.

State files

Everything under .gates/:

.gates/
  last-run/
    run.jsonl              — one TestRecord per test outcome (append-only, JSONL)
    summary.md             — human-readable summary
    failures/
      <test_id>.txt        — per-failure diagnostic bundle (one file = one failure)
    junit-NNNN.xml         — raw pytest XML per chunk (debugging aid)
  state.json               — resume checkpoint (atomic write)
  durations.json           — per-test duration history + recency score
  quarantine.json          — quarantine state (3-of-5-in / 3-consec-out)

Tier system

The federation 13-marker taxonomy maps onto 4 tiers (see TS-25 in mgf-common/docs/standards/TESTING.md §13.11). The --tier 0 selection runs only smoke-marked tests — the canonical "is the build broken?" signal.

Tier Wall-clock guideline Markers
0 <30s total smoke
1 <2min total contract / property / regression / fuzz / integration
2 <5min total concurrency / conformance / use_case
3 nightly / release perf / stress / e2e / slow

Federation alignment

mgf-test-supervisor is stdlib-only. It does not depend on mgf-common at runtime — per TS-23, the supervisor must run even when the project under test is broken.

The JSONL schema (TestRecord shape, MARKER_TIER mapping, SCHEMA_VERSION) is mirrored in mgf.common.testing for consumer-side parsing. Both surfaces are alignment-pinned by tests in mgf-common's tests/unit/testing/test_supervisor_alignment.py AND in mgf-test-supervisor's tests/unit/test_alignment_with_mgf_common.py. A field rename in one without the other fails CI on both.

License

MIT. See LICENSE.

Related federation libraries

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mgf_test_supervisor-0.1.9.tar.gz (126.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mgf_test_supervisor-0.1.9-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file mgf_test_supervisor-0.1.9.tar.gz.

File metadata

  • Download URL: mgf_test_supervisor-0.1.9.tar.gz
  • Upload date:
  • Size: 126.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for mgf_test_supervisor-0.1.9.tar.gz
Algorithm Hash digest
SHA256 51b35925569f5d02907085b172ae652eedf7558dd01e854b5dc6f70b91d46f95
MD5 07f3b459ba653cebf40f513a163d6d07
BLAKE2b-256 2290b65e0d2a7222d4e2cef6c77644f743b8628e04cc84bc783e947d808d5751

See more details on using hashes here.

File details

Details for the file mgf_test_supervisor-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for mgf_test_supervisor-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4d5c80433815a117c5fb3cc0ac7d6f95b4ceed657d76de456d87c6b03534b267
MD5 6bab4160bed1b4bcfde5b7ab24955991
BLAKE2b-256 55a3ed6644bb540e56cb31bfe207a847d6c8100d8649cbfca53ef6630427b84e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page