Project-aware consolidated testing CLI for pytest, sandbox, empirical, and retained aggregate validation lanes.

These details have not been verified by PyPI

Project description

Calamum Test

Calamum Test is a standalone, project-aware testing substrate for consolidating pytest, sandbox_test, and empirical_test lanes behind one retained-evidence CLI and Python facade.

Public repository: https://github.com/joediggidyyy/calamum

The current implementation is no longer just a seed scaffold. It now includes:

a shared .calamum/project.json descriptor model
machine-local overlay and active-project state support
a stable importable Python facade in calamum.api
richer catalog metadata for profiles, tags, policy flags, and evidence requirements
retained run manifests and checksums
regenerative aggregate reports for job, project, and domain scopes
detached-signature support for privileged or publishable aggregate artifacts

Command surface

Test execution

calamum test list
calamum test show <definition_id>
calamum test run <definition_id>
calamum test runs list
calamum test runs show <run_id>

A definition_id is the exact id of a test definition in the catalog. Use calamum test list to discover the available ids, then pass one of those ids to calamum test show or calamum test run. Example: seed-cli-smoke.

Project management

calamum test project register
calamum test project set <project>
calamum test project current
calamum test project validate [--project <project>]
calamum test project list
calamum test project show <project>

Aggregate reporting

calamum test reports list
calamum test reports show <report_ref>
calamum test reports generate --scope job --job <job_id>
calamum test reports generate --scope project [--project <project>]
calamum test reports generate --scope domain --domain <domain>

Adding tests to the library

In Calamum, the public "test library" is the tracked catalog at catalog/test_definitions.json.

Current authoring workflow:

add a new definition object to the catalog definitions list
give it a stable id, title, summary, status, and category
classify it with:
- category — primary test class (for example bootstrap, regression, security, adversarial, performance)
- profiles — reusable bundles such as smoke, release, or nightly
- tags — cross-cutting labels for search and future selectors
- policy_flags — execution or governance rules
- evidence_requirements — retained outputs the definition must produce
declare step arrays under the canonical lanes:
- pytest
- sandbox_test
- empirical_test
validate the new entry by running:
- calamum test list
- calamum test show <definition_id>
- calamum test run <definition_id> --dry-run

Minimal definition shape:

{
	"id": "adversarial-auth-smoke",
	"title": "Adversarial auth smoke",
	"summary": "Challenge the authentication path with hostile-input and retained-evidence checks.",
  "status": "active",
	"category": "adversarial",
  "profiles": ["smoke", "release"],
	"tags": ["adversarial", "auth", "api", "signing"],
	"policy_flags": ["containment", "json-first", "project-aware", "release-gate"],
	"evidence_requirements": ["report_json", "report_md", "manifest_json", "checksums_json"],
  "default_lanes": ["pytest", "sandbox_test"],
  "lanes": {
	 "pytest": [],
	 "sandbox_test": [],
	 "empirical_test": []
  }
}

Current limitation: authoring is still manual. Calamum does not yet ship a dedicated calamum test catalog scaffold|validate management surface, so the catalog file remains the authoritative place to add new tests today.

Plain-language meaning of the fields

category = what kind of test this is
profiles = when or why you run it
tags = what area it touches
policy_flags = special execution or governance rules
evidence_requirements = which retained outputs must exist after the run
pytest / sandbox_test / empirical_test = the three execution lane classes inside one definition

The important division is this:

one definition = one named test in the library
one definition can use one, two, or all three lane classes
the lane classes are not separate library entries; they are the three ways Calamum can gather evidence for the same test

Controlled library vocabulary (v1)

Calamum now treats the following values as the contracted v1 vocabulary.

Status values

seed — scaffold or early placeholder definition
active — supported definition for normal use
experimental — usable but still being evaluated
deprecated — still readable/runnable for transition purposes but being retired
disabled — present in the catalog but not intended for ordinary execution

Category values

adversarial — deliberate hostile-input, penetration-style, abuse-case, or attack-path validation
general — mixed or uncategorized definition; use sparingly
bootstrap — proves basic setup, installation, or command-surface readiness
regression — protects a known workflow or behavior from breaking
security — validates defensive trust, signing, access, or safety posture without making hostile challenge the primary identity
performance — validates speed, scale, or resource posture
integration — validates interaction across modules, services, or host applications
compliance — validates policy, contract, or governance conformance

Profile values

default — ordinary day-to-day execution set
smoke — fast confidence check
fast — low-cost local developer check
release — required before publishing or promotion
nightly — broader scheduled validation pack

Tag values

adversarial — hostile-input / penetration-testing facet on a definition whose primary category may or may not already be adversarial
aggregate — aggregate/report generation surface
api — Python or service API surface
auth — authentication / authorization surface
catalog — definition-library / schema surface
cli — command-line surface
filesystem — path, layout, or artifact-root surface
project — project registration / context resolution surface
reporting — rendered reports or report-regeneration surface
retained-evidence — manifests, checksums, receipts, or persisted review evidence
sandbox — isolated or simulated runtime surface
signing — signatures, receipts, or verification surface
smoke — broad confidence check spanning multiple surfaces

Policy flag values

json-first — JSON is the primary machine contract
project-aware — requires resolved project context or tokens
containment — paths and execution roots must stay inside declared boundaries
local-only — intentionally local-only workflow or artifact posture
signed-output — output must be signed and verifiable
privileged-operation — delegated or privileged control path
release-gate — failing result blocks release or promotion
deterministic-output — output is expected to be stable and reproducible

Evidence requirement values

report_json
report_md
manifest_json
checksums_json
stdout_capture
stderr_capture
receipt_json
report_signature
manifest_signature

Lane classes

pytest — automated code-level assertions
sandbox_test — controlled scripted or simulated execution
empirical_test — real observed/manual/live verification

How adversarial testing is represented

use category: adversarial when hostile challenge or penetration-style probing is the primary identity of the definition
use tag adversarial when a definition is primarily something else (for example security or regression) but still contains adversarial coverage
keep adversarial out of the lane field: it is a test type, not an execution medium
execute adversarial definitions through one or more of the normal lanes (pytest, sandbox_test, empirical_test)
when adversarial coverage is claimed, sandbox coverage should normally be present because that is the safest place to exercise hostile-path probes first

Plain-language workflow across all three test classes

Here is the simplest way to think about it.

Use one definition when you are testing one real thing.

Example definition:

id: adversarial-auth-smoke
category: adversarial
profiles: smoke, release
tags: adversarial, auth, api, signing

Then split the same test across the three lane classes like this:

pytest lane
- prove the code-level rules work
- example: token validation, permission checks, malformed-input rejection, JSON packet shape
sandbox_test lane
- prove the command or workflow works in a controlled runtime
- example: run the CLI against hostile fixtures in a safe local sandbox and confirm the right files are written without escaping containment
empirical_test lane
- prove the result also holds in a real observed workflow
- example: operator checks the real auth flow or a live delegated request/receipt path under adversarial review conditions

Plain English summary:

pytest asks: does the code logic pass?
sandbox_test asks: does the workflow run correctly in a safe controlled environment?
empirical_test asks: does it still look correct when a human or real-world run observes it?

Typical full workflow:

add the definition to catalog/test_definitions.json
run calamum test show adversarial-auth-smoke
run calamum test run adversarial-auth-smoke --dry-run
run calamum test run adversarial-auth-smoke
inspect one combined retained evidence pack under:
- .calamum/generated/runs/<run_id>/
if the run belongs to a job or release lane, generate aggregates under:
- .calamum/generated/reports/generated/

That is the core model: one named adversarial or non-adversarial test definition, three possible lane classes, one retained evidence pack.

Retained evidence contract

Every calamum test run retains:

report.json
report.md
checksums.json
manifest.json
per-step stdout/stderr captures
append-only .calamum/generated/runs/run_index.jsonl

Aggregate report generation retains:

report.json
report.md
manifest.json
receipt.json
checksum sidecars
optional detached signatures for JSON artifacts

Filesystem layout and default output contract

Calamum uses a small split between tracked inputs and local-only generated outputs.

Tracked by default:

.calamum/project.json — shared project descriptor
catalog/test_definitions.json — tracked definition catalog

Local-only by default:

.calamum/generated/runs/ — retained run evidence
.calamum/generated/reports/ — materialized aggregate reports
.calamum/generated/.gitignore — local-only guard so generated output stays untracked

Default tree:

project-root/
├─ .calamum/
│  ├─ project.json
│  └─ generated/
│     ├─ .gitignore
│     ├─ runs/
│     │  ├─ run_index.jsonl
│     │  └─ <run_id>/
│     │     ├─ report.json
│     │     ├─ report.md
│     │     ├─ checksums.json
│     │     ├─ manifest.json
│     │     └─ <lane>/
│     │        ├─ <step>.stdout.txt
│     │        └─ <step>.stderr.txt
│     └─ reports/
│        └─ generated/
│           ├─ report_index.jsonl
│           └─ <scope>/
│              └─ <target>/
│                 ├─ report.json
│                 ├─ report.md
│                 ├─ manifest.json
│                 ├─ receipt.json
│                 ├─ *.sha256
│                 ├─ *.sig                # when signing is enabled
│                 └─ history/<timestamp>/
└─ catalog/
	 └─ test_definitions.json

This is the default contract unless the operator overrides one or more roots during project registration or later through local overlay settings / explicit CLI flags.

Workflow notes

Child-repo / self-hosted workflow: the checked-in projects/calamum/.calamum/project.json points generated outputs into .calamum/generated/.
Adopt an existing repo: calamum test project register now bootstraps the minimal local scaffold by creating:
- catalog/test_definitions.json if it does not exist yet
- .calamum/generated/.gitignore
- the standard runs/ and reports/ directories under .calamum/generated/
Application profile note: --application <id> is currently stored on the project record and exposed to tokens/readback, but it does not yet auto-expand a data-driven profile with implied markers, path aliases, or report defaults. For CodeSentinel, pass the explicit registration arguments you want on the first local exercise.
Override workflow: --runs-root, --reports-root, --catalog-root, and the machine-local overlay still win when the operator intentionally wants a different layout.

If you do not override anything, test reports go to .calamum/generated/runs/ and aggregate reports go to .calamum/generated/reports/generated/.

Concrete local-first CodeSentinel workflow notes now live in:

planning/CALAMUM_CODESENTINEL_LOCAL_ADOPTION_SCRATCHPAD_20260423.md

Project resolution order

Calamum resolves project context in the following order:

explicit --project
nearest ancestor .calamum/project.json
CALAMUM_PROJECT
active project stored in local state

Within a resolved project, path/runtime resolution follows:

explicit command flags
machine-local overlay
shared descriptor
built-in defaults

Quick start

From projects/calamum/:

install in editable mode
validate the default shared descriptor
run the seed smoke definition
generate a project aggregate from retained evidence

Example flow:

python -m pip install -e .[dev]
calamum test project current --json
calamum test list --json
calamum test run seed-cli-smoke --job local-smoke --json
calamum test reports generate --scope project --project calamum-test --json

After the sample run, inspect:

.calamum/generated/runs/run_index.jsonl
.calamum/generated/reports/generated/report_index.jsonl

If you want to avoid installing the console script during early development, run python -m calamum ... from the project root after setting PYTHONPATH=src for the session.

Signing and privileged flows

Privileged aggregate generation can verify detached requests and emit signed receipts and report artifacts.

Relevant local environment variables:

CALAMUM_ED25519_PRIVATE_KEY
CALAMUM_ED25519_PUBLIC_KEY
CALAMUM_POLICY_SIGNING_KEY
CALAMUM_CONFIG_ROOT

For local development, a fallback HMAC or SHA lane is supported. For publishable or cross-application flows, prefer Ed25519.

Python facade

The package exports a stable surface for host applications via calamum.api, including helpers to:

resolve or require project context
register and validate projects
list/show/run definitions
list/show retained runs
generate/list/show aggregate reports

Why this repo exists

This child project adapts the strongest patterns from the earlier Calamum/observer testing surfaces into one reusable testing substrate with a cleaner release boundary.

The design goals are:

deterministic project-aware execution
retained evidence instead of terminal-history dependence
JSON-first machine readability with Markdown companions
regenerable report surfaces
credible privileged/publication security hooks without hardcoding secrets

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

May 3, 2026

0.3.0

May 2, 2026

This version

0.2.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calamum_test-0.2.0.tar.gz (50.2 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

calamum_test-0.2.0-py3-none-any.whl (44.2 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file calamum_test-0.2.0.tar.gz.

File metadata

Download URL: calamum_test-0.2.0.tar.gz
Upload date: Apr 24, 2026
Size: 50.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for calamum_test-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`059eb2ebe0df13a00303cbb4c306d920cb305f9ad81626dcc0f437057085cadc`
MD5	`fc91e3a2352c2843920877dc654e5bff`
BLAKE2b-256	`f3f98b377711e3a0b4c5cc84068724dfc5075fb109a12cc918b787808b7a1c11`

See more details on using hashes here.

File details

Details for the file calamum_test-0.2.0-py3-none-any.whl.

File metadata

Download URL: calamum_test-0.2.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 44.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for calamum_test-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e40a2c6ff05bcb2cbb3b23f2e192b14a78e362b882ddc8d0ccdfc2f0a623a104`
MD5	`7cff0a855379dbda49f8a1b68be36ac5`
BLAKE2b-256	`0c88f1da26fcdf5ae9e40bb5283d88dc3fce22f8fb4ca6a6f260864261854f82`

See more details on using hashes here.

calamum-test 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Calamum Test

Command surface

Test execution

Project management

Aggregate reporting

Adding tests to the library

Plain-language meaning of the fields

Controlled library vocabulary (v1)

Status values

Category values

Profile values

Tag values

Policy flag values

Evidence requirement values

Lane classes

How adversarial testing is represented

Plain-language workflow across all three test classes

Retained evidence contract

Filesystem layout and default output contract

Workflow notes

Project resolution order

Quick start

Signing and privileged flows

Python facade

Why this repo exists

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes