Project-aware consolidated testing CLI for pytest, sandbox, empirical, and retained aggregate validation lanes.
Project description
Calamum Test
Calamum Test is a standalone, project-aware testing substrate for consolidating pytest, sandbox_test, and empirical_test lanes behind one retained-evidence CLI and Python facade.
Public repository: https://github.com/joediggidyyy/calamum
Install and verify
Install from PyPI using the published distribution name:
pip install calamum-test
Run the installed CLI using the runtime command name:
calamum --versioncalamum -h
Important naming note:
- PyPI package / dependency name:
calamum-test - import package and runtime command:
calamum
The current implementation is no longer just a seed scaffold. It now includes:
- a shared
.calamum/project.jsondescriptor model - machine-local overlay and active-project state support
- a stable importable Python facade in
calamum.api - richer catalog metadata for profiles, tags, policy flags, and evidence requirements
- retained run manifests and checksums
- regenerative aggregate reports for
job,project, anddomainscopes - detached-signature support for privileged or publishable aggregate artifacts
Command surface
Top-level families
calamum test— validation definition discovery and executioncalamum project— project registration, active-project state, and context readbackcalamum monitor— current monitor-shell scaffolding and capability readback
Test execution
calamum test listcalamum test show <definition_id>calamum test run <definition_id>calamum test runs listcalamum test runs show <run_id>
A definition_id is the exact id of a test definition in the catalog. Use
calamum test list to discover the available ids, then pass one of those ids to
calamum test show or calamum test run. Example: seed-cli-smoke.
Progress visibility: calamum test run emits a heartbeat line to stderr every
20 seconds while a long-running step subprocess is active. This indicates the
orchestrator is alive and the step is progressing, not stalled. The retained-evidence
artifacts (report.json, report.md, manifest.json, checksums.json) are
written only after the step completes; heartbeat lines are transient and not stored.
Project management
calamum project registercalamum project set <project>calamum project currentcalamum project validate [<project>]calamum project listcalamum project show <project>
Compatibility note:
calamum test project ...remains available as a compatibility alias during the current route migration.
Monitor scaffolding
calamum monitor capability list
Current note:
- the top-level
monitorfamily is now present as native Calamum scaffolding for future monitor adapters and readiness surfaces; - broader monitor execution families are still a follow-on implementation lane, so current help/runtime output should be treated as the truthful monitor-shell baseline rather than a claim of full capture parity.
Aggregate reporting
calamum test reports listcalamum test reports show <report_ref>calamum test reports generate --scope job --job <job_id>calamum test reports generate --scope project [--project <project>]calamum test reports generate --scope domain --domain <domain>
Adding tests to the library
In Calamum, the public "test library" is the tracked catalog at
catalog/test_definitions.json.
Current authoring workflow:
- add a new definition object to the catalog
definitionslist - give it a stable
id,title,summary,status, andcategory - classify it with:
category— primary test class (for examplebootstrap,regression,security,adversarial,performance)profiles— reusable bundles such assmoke,release, ornightlytags— cross-cutting labels for search and future selectorspolicy_flags— execution or governance rulesevidence_requirements— retained outputs the definition must produce
- declare step arrays under the canonical lanes:
pytestsandbox_testempirical_test
- validate the new entry by running:
calamum test listcalamum test show <definition_id>calamum test run <definition_id> --dry-run
Minimal definition shape:
{
"id": "adversarial-auth-smoke",
"title": "Adversarial auth smoke",
"summary": "Challenge the authentication path with hostile-input and retained-evidence checks.",
"status": "active",
"category": "adversarial",
"profiles": ["smoke", "release"],
"tags": ["adversarial", "auth", "api", "signing"],
"policy_flags": ["containment", "json-first", "project-aware", "release-gate"],
"evidence_requirements": ["report_json", "report_md", "manifest_json", "checksums_json"],
"default_lanes": ["pytest", "sandbox_test"],
"lanes": {
"pytest": [],
"sandbox_test": [],
"empirical_test": []
}
}
Current limitation: authoring is still manual. Calamum does not yet ship a
dedicated calamum test catalog scaffold|validate management surface, so the
catalog file remains the authoritative place to add new tests today.
Plain-language meaning of the fields
category= what kind of test this isprofiles= when or why you run ittags= what area it touchespolicy_flags= special execution or governance rulesevidence_requirements= which retained outputs must exist after the runpytest/sandbox_test/empirical_test= the three execution lane classes inside one definition
The important division is this:
- one definition = one named test in the library
- one definition can use one, two, or all three lane classes
- the lane classes are not separate library entries; they are the three ways Calamum can gather evidence for the same test
Controlled library vocabulary (v1)
Calamum now treats the following values as the contracted v1 vocabulary.
Status values
seed— scaffold or early placeholder definitionactive— supported definition for normal useexperimental— usable but still being evaluateddeprecated— still readable/runnable for transition purposes but being retireddisabled— present in the catalog but not intended for ordinary execution
Category values
adversarial— deliberate hostile-input, penetration-style, abuse-case, or attack-path validationgeneral— mixed or uncategorized definition; use sparinglybootstrap— proves basic setup, installation, or command-surface readinessregression— protects a known workflow or behavior from breakingsecurity— validates defensive trust, signing, access, or safety posture without making hostile challenge the primary identityperformance— validates speed, scale, or resource postureintegration— validates interaction across modules, services, or host applicationscompliance— validates policy, contract, or governance conformance
Profile values
default— ordinary day-to-day execution setsmoke— fast confidence checkfast— low-cost local developer checkrelease— required before publishing or promotionnightly— broader scheduled validation pack
Tag values
adversarial— hostile-input / penetration-testing facet on a definition whose primary category may or may not already be adversarialaggregate— aggregate/report generation surfaceapi— Python or service API surfaceauth— authentication / authorization surfacecatalog— definition-library / schema surfacecli— command-line surfacefilesystem— path, layout, or artifact-root surfaceproject— project registration / context resolution surfacereporting— rendered reports or report-regeneration surfaceretained-evidence— manifests, checksums, receipts, or persisted review evidencesandbox— isolated or simulated runtime surfacesigning— signatures, receipts, or verification surfacesmoke— broad confidence check spanning multiple surfaces
Policy flag values
json-first— JSON is the primary machine contractproject-aware— requires resolved project context or tokenscontainment— paths and execution roots must stay inside declared boundarieslocal-only— intentionally local-only workflow or artifact posturesigned-output— output must be signed and verifiableprivileged-operation— delegated or privileged control pathrelease-gate— failing result blocks release or promotiondeterministic-output— output is expected to be stable and reproducible
Evidence requirement values
report_jsonreport_mdmanifest_jsonchecksums_jsonstdout_capturestderr_capturereceipt_jsonreport_signaturemanifest_signature
Lane classes
pytest— automated code-level assertionssandbox_test— controlled scripted or simulated executionempirical_test— real observed/manual/live verification
How adversarial testing is represented
- use
category: adversarialwhen hostile challenge or penetration-style probing is the primary identity of the definition - use tag
adversarialwhen a definition is primarily something else (for examplesecurityorregression) but still contains adversarial coverage - keep
adversarialout of the lane field: it is a test type, not an execution medium - execute adversarial definitions through one or more of the normal lanes (
pytest,sandbox_test,empirical_test) - when adversarial coverage is claimed, sandbox coverage should normally be present because that is the safest place to exercise hostile-path probes first
Plain-language workflow across all three test classes
Here is the simplest way to think about it.
Use one definition when you are testing one real thing.
Example definition:
- id:
adversarial-auth-smoke - category:
adversarial - profiles:
smoke,release - tags:
adversarial,auth,api,signing
Then split the same test across the three lane classes like this:
pytestlane- prove the code-level rules work
- example: token validation, permission checks, malformed-input rejection, JSON packet shape
sandbox_testlane- prove the command or workflow works in a controlled runtime
- example: run the CLI against hostile fixtures in a safe local sandbox and confirm the right files are written without escaping containment
empirical_testlane- prove the result also holds in a real observed workflow
- example: operator checks the real auth flow or a live delegated request/receipt path under adversarial review conditions
Plain English summary:
pytestasks: does the code logic pass?sandbox_testasks: does the workflow run correctly in a safe controlled environment?empirical_testasks: does it still look correct when a human or real-world run observes it?
Typical full workflow:
- add the definition to
catalog/test_definitions.json - run
calamum test show adversarial-auth-smoke - run
calamum test run adversarial-auth-smoke --dry-run - run
calamum test run adversarial-auth-smoke - inspect one combined retained evidence pack under:
.calamum/generated/runs/<run_id>/
- if the run belongs to a job or release lane, generate aggregates under:
.calamum/generated/reports/generated/
That is the core model: one named adversarial or non-adversarial test definition, three possible lane classes, one retained evidence pack.
Retained evidence contract
Every calamum test run retains:
report.jsonreport.mdchecksums.jsonmanifest.json- per-step stdout/stderr captures
- append-only
.calamum/generated/runs/run_index.jsonl
Aggregate report generation retains:
report.jsonreport.mdmanifest.jsonreceipt.json- checksum sidecars
- optional detached signatures for JSON artifacts
Filesystem layout and default output contract
Calamum uses a small split between tracked inputs and local-only generated outputs.
Tracked by default:
.calamum/project.json— shared project descriptorcatalog/test_definitions.json— tracked definition catalog
Local-only by default:
.calamum/generated/runs/— retained run evidence.calamum/generated/reports/— materialized aggregate reports.calamum/generated/.gitignore— local-only guard so generated output stays untracked
Default tree:
project-root/
├─ .calamum/
│ ├─ project.json
│ └─ generated/
│ ├─ .gitignore
│ ├─ runs/
│ │ ├─ run_index.jsonl
│ │ └─ <run_id>/
│ │ ├─ report.json
│ │ ├─ report.md
│ │ ├─ checksums.json
│ │ ├─ manifest.json
│ │ └─ <lane>/
│ │ ├─ <step>.stdout.txt
│ │ └─ <step>.stderr.txt
│ └─ reports/
│ └─ generated/
│ ├─ report_index.jsonl
│ └─ <scope>/
│ └─ <target>/
│ ├─ report.json
│ ├─ report.md
│ ├─ manifest.json
│ ├─ receipt.json
│ ├─ *.sha256
│ ├─ *.sig # when signing is enabled
│ └─ history/<timestamp>/
└─ catalog/
└─ test_definitions.json
This is the default contract unless the operator overrides one or more roots during project registration or later through local overlay settings / explicit CLI flags.
Workflow notes
- Child-repo / self-hosted workflow: the checked-in
projects/calamum/.calamum/project.jsonpoints generated outputs into.calamum/generated/. - Adopt an existing repo:
calamum test project registernow bootstraps the minimal local scaffold by creating:catalog/test_definitions.jsonif it does not exist yet.calamum/generated/.gitignore- the standard
runs/andreports/directories under.calamum/generated/
- Application profile note:
--application <id>is currently stored on the project record and exposed to tokens/readback, but it does not yet auto-expand a data-driven profile with implied markers, path aliases, or report defaults. For CodeSentinel, pass the explicit registration arguments you want on the first local exercise. - Override workflow:
--runs-root,--reports-root,--catalog-root, and the machine-local overlay still win when the operator intentionally wants a different layout.
If you do not override anything, test reports go to .calamum/generated/runs/ and
aggregate reports go to .calamum/generated/reports/generated/.
Concrete local-first CodeSentinel workflow notes now live in:
planning/CALAMUM_CODESENTINEL_LOCAL_ADOPTION_SCRATCHPAD_20260423.md
Project resolution order
Calamum resolves project context in the following order:
- explicit
--project - nearest ancestor
.calamum/project.json CALAMUM_PROJECT- active project stored in local state
Within a resolved project, path/runtime resolution follows:
- explicit command flags
- machine-local overlay
- shared descriptor
- built-in defaults
Quick start
From projects/calamum/:
- install in editable mode
- validate the default shared descriptor
- run the seed smoke definition
- generate a project aggregate from retained evidence
Example flow:
python -m pip install -e .[dev]calamum --versioncalamum project current --jsoncalamum test list --jsoncalamum test run seed-cli-smoke --job local-smoke --jsoncalamum test reports generate --scope project --project calamum-test --json
After the sample run, inspect:
.calamum/generated/runs/run_index.jsonl.calamum/generated/reports/generated/report_index.jsonl
If you want to avoid installing the console script during early development, run python -m calamum ... from the project root after setting PYTHONPATH=src for the session.
Signing and privileged flows
Privileged aggregate generation can verify detached requests and emit signed receipts and report artifacts.
Relevant local environment variables:
CALAMUM_ED25519_PRIVATE_KEYCALAMUM_ED25519_PUBLIC_KEYCALAMUM_POLICY_SIGNING_KEYCALAMUM_CONFIG_ROOT
For local development, a fallback HMAC or SHA lane is supported. For publishable or cross-application flows, prefer Ed25519.
Python facade
The package exports a stable surface for host applications via calamum.api, including helpers to:
- resolve or require project context
- register and validate projects
- list/show/run definitions
- list/show retained runs
- generate/list/show aggregate reports
Why this repo exists
This child project adapts the strongest patterns from the earlier Calamum/observer testing surfaces into one reusable testing substrate with a cleaner release boundary.
The design goals are:
- deterministic project-aware execution
- retained evidence instead of terminal-history dependence
- JSON-first machine readability with Markdown companions
- regenerable report surfaces
- credible privileged/publication security hooks without hardcoding secrets
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file calamum_test-0.3.0.tar.gz.
File metadata
- Download URL: calamum_test-0.3.0.tar.gz
- Upload date:
- Size: 3.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
429e2ff91e7d12da9eaa8872d2465413d172ea13c485455fe636b47cc5436b89
|
|
| MD5 |
32f1f388a8a7ab7302e9f1c07b2e15c7
|
|
| BLAKE2b-256 |
fa42d9a3ddc12eed9e2cb488d1521a489cf0eb05bbee2afda670537c0bafb963
|
File details
Details for the file calamum_test-0.3.0-py3-none-any.whl.
File metadata
- Download URL: calamum_test-0.3.0-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c0a422f8e49336845761a98c5a865879dd7fbf5d328f81013467a873c118b6b
|
|
| MD5 |
b1c319a45af1cca55a60683b2842c629
|
|
| BLAKE2b-256 |
7ccd38a1a7ba88c57999cd1ff9fbf147f4f27a787b064f2daf0fb1959f47a9ab
|