Deterministic data-movement audit, validation and optimization reports
Project description
Copy-Space Guard
Copy-Space Guard is a metadata-only CLI for deterministic data-movement audits and CI regression gates. Current release: v0.2.3 on PyPI. Status: production-oriented pilot.
It takes a transfer demand matrix (src_slot,dst_slot,bits_total), validates schedules under a declared resource model, compares a baseline or customer schedule against a deterministic greedy candidate, and produces sales/engineering reports with lower-bound gap, utilization and estimated savings.
This package is intentionally small and pilot-friendly:
- no external Python dependencies;
- no payload data required;
- deterministic output for CI and regression tracking;
- machine-readable JSON plus human-readable Markdown/HTML reports.
Product promise
Give us one transfer trace or demand matrix. In a few days we show whether your data-movement plan is conflict-free, how far it is from a deterministic lower bound, and what CI gate can prevent future regressions.
Quickstart
Install from PyPI:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install copyspace-guard
copyspace-guard --version
Run the bundled example from a repository checkout:
copyspace-guard analyze \
--csv examples/ring15.csv \
--bw 256 \
--id ai-staging-ring15 \
--roi examples/roi.yml \
--outdir artifacts/demo
Open:
artifacts/demo/report.htmlartifacts/demo/report.mdartifacts/demo/summary.json
Expected terminal shape:
baseline: status=PASS ticks=768 lb=549 gap=0.398907 util=0.7143
greedy: status=PASS ticks=549 lb=549 gap=0.000000 util=0.9992
saved_ticks=219 estimated_savings=9.73
The exact numbers depend on the input CSV and bandwidth value.
For local development from this repository, install editable mode with development tooling:
python -m pip install -e ".[dev]"
make test
make security
Input format
CSV with header:
src_slot,dst_slot,bits_total
0,1,65536
1,2,65536
Meaning:
src_slot— source endpoint ID;dst_slot— destination endpoint ID;bits_total— transfer volume from source to destination.
Duplicate pairs are automatically merged.
Models v0: STRICT1 and READ1_WRITE1
STRICT1: within one tick, each slot can participate in at most one transfer, either as source or destination.
READ1_WRITE1: within one tick, each slot may send at most once and receive at most once.
This is a useful baseline for:
- endpoint-limited transfer systems;
- shuffle/staging/replication analysis;
- CI regression gates;
- comparing scheduler strategies;
- first customer audits where full topology is not yet modeled.
It is not a universal network model. For real pilots, confirm whether the client needs extensions such as READ1_WRITE1, broadcast, topology-aware bandwidth, asymmetric links or tier-aware storage constraints.
Commands
Check local pilot readiness
copyspace-guard --version
copyspace-guard doctor --root .
copyspace-guard doctor --root . --json
Analyze CSV and generate reports
copyspace-guard analyze --csv INPUT.csv --bw 256 --outdir artifacts/run
Optional:
--slots N
--id workload-name
--notes "free text"
--cost-per-tick 0.02
--model STRICT1 # or READ1_WRITE1
--bounds-subset-limit 20
--max-errors 100
--max-demands 100000
--max-slots 10000
--max-output-ticks 1000000
--bounds-subset-limit controls exhaustive STRICT1 subset-density enumeration and is protected by a hard cap to avoid accidental exponential runs.
Validate a schedule
copyspace-guard validate artifacts/run/instance.json artifacts/run/schedule_greedy.json --report artifacts/run/validation.json
Regenerate Markdown/HTML reports
copyspace-guard report artifacts/run/summary.json --outdir artifacts/report
Validate generated artifact contracts
copyspace-guard validate-artifact --kind summary artifacts/run/summary.json
Run production-oriented checks
make test
make security
make production-check
make test runs ruff, mypy, compileall, unit/property/CLI tests, coverage and a CI gate smoke. make security runs Bandit over src/tools and pip-audit over the Python environment. make production-check runs release checks plus a small synthetic performance suite. The suite can also be run directly:
copyspace-guard bench-suite --outdir artifacts/bench-suite --max-total-seconds 30
Customer/current schedule input
For a real pilot, the customer may already have an actual schedule. Use CSV:
tick,src_slot,dst_slot,len_bits
0,0,1,256
0,2,3,256
1,1,2,256
Then run:
copyspace-guard analyze \
--csv examples/ring15.csv \
--bw 256 \
--current-schedule-csv customer_schedule.csv \
--outdir artifacts/customer-run
You can also convert a schedule CSV to JSON:
copyspace-guard schedule-csv-to-json --csv customer_schedule.csv --out schedule.json
CI gate command
After analyze, fail/pass thresholds can be checked locally or in CI:
copyspace-guard gate artifacts/demo/summary.json \
--report greedy \
--max-gap 0.15 \
--min-utilization 0.85
Exit code 0 means pass, exit code 2 means fail.
Files generated by analyze
instance.json— normalized workload contract.schedule_baseline.jsonorschedule_customer_current.json— current schedule artifact, unless--summary-onlyis used.schedule_greedy.json— deterministic candidate schedule, unless--summary-onlyis used.schedule_baseline.csvorschedule_customer_current.csv— CSV schedule artifact, unless--summary-onlyis used.schedule_greedy.csv— deterministic candidate schedule CSV, unless--summary-onlyis used.report_baseline.jsonorreport_customer_current.json— validation metrics for the current schedule.report_greedy.json— validation metrics for candidate.summary.json— machine-readable comparison summary.report.md— human-readable audit report.report.html— shareable report for demos and sales calls.
v0.2.3 boundaries
Included:
- volume-based demand modeling;
- deterministic baseline and greedy schedules;
- STRICT1 and READ1_WRITE1 validators;
- lower-bound gap and utilization metrics;
- ROI estimates via
roi.ymlor a simple$ per tickassumption; - sales-ready report artifacts;
- PyPI publishing through GitHub Actions Trusted Publishing;
- matrix CI for Python 3.10, 3.11 and 3.12;
- required CI checks for tests, build, Docker smoke and security scans;
- release version guard for tag/version synchronization;
- Dependabot automation for GitHub Actions updates.
Not included yet:
- topology/path selection;
- real transfer execution;
- cloud adapter importers;
- address-level offset validation;
- VCopySpace receipt ledger integration;
- topology/path-specific CSV importers beyond the current demand and schedule formats.
Known operational caveats:
- Customer schedule CSVs used in streaming mode must be sorted by non-decreasing
tick. - Full artifact mode can produce large schedule JSON/CSV files; use
--summary-onlyfor large pilots and CI. - For large STRICT1 slot counts, subset-density lower bounds may be partial; check
bounds_completein reports. - The greedy schedule is deterministic and useful for comparison, but it is not a proof of global optimality.
- Demand and schedule core fields are parsed as integers. Pass-through text columns in anonymized CSV outputs are prefixed with a single quote when they begin with spreadsheet formula trigger characters (
=,+,-,@, tab or carriage return).
How this maps to the larger project set
copy-space→ scheduler, validator, lower-bound gap, CI-gate idea.vcopyspace→ future enterprise layer: receipt-based metering, ledger, trace/replay, cost model.DDAS→ long-term deterministic state-transition foundation.
ROI mode
Turn saved ticks into business impact:
copyspace-guard analyze \
--csv examples/ring15.csv \
--bw 256 \
--roi examples/roi.yml \
--outdir artifacts/demo
Example examples/roi.yml:
roi:
tick_seconds: 1
gpu_count_blocked: 64
gpu_hour_cost_usd: 2.50
runs_per_day: 12
days_per_month: 30
Gate config file
copyspace-guard gate artifacts/demo/summary.json \
--config examples/copyspace_guard.yml
Example config:
gates:
report: greedy
max_gap_to_lower_bound: 0.15
min_utilization: 0.85
Release automation
Tag releases are published to GitHub Releases and PyPI. Before a tag publishes, the release workflow verifies that the tag version matches both pyproject.toml and copyspace_guard.__version__.
Prepare a version bump locally:
VERSION=0.2.3 NOTE="Short release note" make bump-version
TAG=v0.2.3 make release-guard
GitHub release notes are autogenerated from merged pull requests. See docs/RELEASE_PROCESS.md for the full process and PyPI Trusted Publishing configuration.
Docker
docker build -t copyspace-guard .
docker run --rm --user "$(id -u):$(id -g)" -v "$PWD:/work" copyspace-guard analyze \
--csv examples/ring15.csv \
--bw 256 \
--roi examples/roi.yml \
--outdir artifacts/docker-demo
Industry demos
copyspace-guard analyze --csv examples/ai_checkpoint.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/ai-checkpoint
copyspace-guard analyze --csv examples/db_shuffle.csv --bw 262144 --roi examples/roi.yml --outdir artifacts/db-shuffle
copyspace-guard analyze --csv examples/storage_replication.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/storage-replication
copyspace-guard analyze --csv examples/kv_cache_movement.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/kv-cache
Client package
See client-package/ for a minimal package that can be sent to a pilot customer:
README_CLIENT.mdsample_demands.csvsample_schedule.csvroi.ymlcopyspace_guard.ymlrun_local.shintake.md
Anonymize demands or schedules
copyspace-guard anonymize \
--kind demands \
--csv raw_demands.csv \
--out anonymized_demands.csv \
--mapping slot_mapping.json
copyspace-guard anonymize \
--kind schedule \
--csv raw_schedule.csv \
--out anonymized_schedule.csv \
--mapping-in slot_mapping.json \
--mapping schedule_slot_mapping.json
Use --mapping-in when anonymizing demands and schedules that must share the same slot-ID mapping. Do not share mapping.json unless you intend to reveal the original endpoint names.
Sales-oriented demos
Bad current schedule vs candidate:
copyspace-guard analyze --csv examples/demo_bad_current_demands.csv --bw 256 --current-schedule-csv examples/demo_bad_current_schedule.csv --roi examples/roi.yml --outdir artifacts/bad-current-demo
Conflict detection:
copyspace-guard analyze --csv examples/demo_conflict_demands.csv --bw 256 --current-schedule-csv examples/demo_conflict_schedule.csv --summary-only --outdir artifacts/conflict-demo
Large workloads can use --summary-only to avoid writing full schedule JSON/CSV artifacts. In this mode generated baseline/candidate schedules are streamed into the validator instead of materialized in memory. Customer schedule CSVs used in streaming mode must be sorted by non-decreasing tick.
Model and bound details
- Model limitations:
docs/MODEL_LIMITATIONS.md - Lower-bound definitions:
docs/BOUNDS.md - JSON schemas:
docs/SCHEMAS.md - Artifact contracts:
docs/ARTIFACT_CONTRACTS.md - Performance notes:
docs/PERFORMANCE.md - Pilot readiness:
docs/PILOT_READINESS.md - Production readiness:
docs/PRODUCTION_READINESS.md - Operations guide:
docs/OPERATIONS.md - Release process:
docs/RELEASE_PROCESS.md - Threat model:
docs/THREAT_MODEL.md - Data handling:
docs/DATA_HANDLING.md - Changelog:
CHANGELOG.md
Benchmark
copyspace-guard bench --slots 64 --bits-per-edge 1048576 --bw 1048576 --outdir artifacts/bench
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file copyspace_guard-0.2.3.tar.gz.
File metadata
- Download URL: copyspace_guard-0.2.3.tar.gz
- Upload date:
- Size: 96.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3eaa25f96b5e4b5c0075c09e48c96836d0462677190b3f199822c5aa1618157c
|
|
| MD5 |
cbd2eee268d67bab631e21ab7afdbe72
|
|
| BLAKE2b-256 |
610f3cbbfae51ec04cfca38564c003582466685e9262d060367407aff54b96fd
|
Provenance
The following attestation bundles were made for copyspace_guard-0.2.3.tar.gz:
Publisher:
release.yml on bortoq/copyspace-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
copyspace_guard-0.2.3.tar.gz -
Subject digest:
3eaa25f96b5e4b5c0075c09e48c96836d0462677190b3f199822c5aa1618157c - Sigstore transparency entry: 1591972993
- Sigstore integration time:
-
Permalink:
bortoq/copyspace-guard@8cd9bd238c6f2534388869d800f485c5602b1c98 -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/bortoq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8cd9bd238c6f2534388869d800f485c5602b1c98 -
Trigger Event:
push
-
Statement type:
File details
Details for the file copyspace_guard-0.2.3-py3-none-any.whl.
File metadata
- Download URL: copyspace_guard-0.2.3-py3-none-any.whl
- Upload date:
- Size: 35.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf9a89a3d62bb502e4104b350011e43225c0eefb13b735e410c805b05dba744e
|
|
| MD5 |
747f6df5d382c938069cc01cf5ee3025
|
|
| BLAKE2b-256 |
98eb87660509a0834e318590e55f0b63ebfa612625908db6be0d916b71e941ff
|
Provenance
The following attestation bundles were made for copyspace_guard-0.2.3-py3-none-any.whl:
Publisher:
release.yml on bortoq/copyspace-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
copyspace_guard-0.2.3-py3-none-any.whl -
Subject digest:
bf9a89a3d62bb502e4104b350011e43225c0eefb13b735e410c805b05dba744e - Sigstore transparency entry: 1591973014
- Sigstore integration time:
-
Permalink:
bortoq/copyspace-guard@8cd9bd238c6f2534388869d800f485c5602b1c98 -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/bortoq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8cd9bd238c6f2534388869d800f485c5602b1c98 -
Trigger Event:
push
-
Statement type: