Skip to main content

Deterministic data-movement audit, validation and optimization reports

Project description

Copy-Space Guard

CI Coverage PyPI License Python

Copy-Space Guard is a metadata-only CLI for deterministic data-movement audits and CI regression gates. Current release: v0.2.3 on PyPI. Status: production-oriented pilot.

It takes a transfer demand matrix (src_slot,dst_slot,bits_total), validates schedules under a declared resource model, compares a baseline or customer schedule against a deterministic greedy candidate, and produces sales/engineering reports with lower-bound gap, utilization and estimated savings.

This package is intentionally small and pilot-friendly:

  • no external Python dependencies;
  • no payload data required;
  • deterministic output for CI and regression tracking;
  • machine-readable JSON plus human-readable Markdown/HTML reports.

Copy-Space Guard report preview

Product promise

Give us one transfer trace or demand matrix. In a few days we show whether your data-movement plan is conflict-free, how far it is from a deterministic lower bound, and what CI gate can prevent future regressions.

Quickstart

Install from PyPI:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install copyspace-guard
copyspace-guard --version

Run the bundled example from a repository checkout:

copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --id ai-staging-ring15 \
  --roi examples/roi.yml \
  --outdir artifacts/demo

Open:

  • artifacts/demo/report.html
  • artifacts/demo/report.md
  • artifacts/demo/summary.json

Expected terminal shape:

baseline: status=PASS ticks=768 lb=549 gap=0.398907 util=0.7143
greedy:   status=PASS ticks=549 lb=549 gap=0.000000 util=0.9992
saved_ticks=219 estimated_savings=9.73

The exact numbers depend on the input CSV and bandwidth value.

For local development from this repository, install editable mode with development tooling:

python -m pip install -e ".[dev]"
make test
make security

Input format

CSV with header:

src_slot,dst_slot,bits_total
0,1,65536
1,2,65536

Meaning:

  • src_slot — source endpoint ID;
  • dst_slot — destination endpoint ID;
  • bits_total — transfer volume from source to destination.

Duplicate pairs are automatically merged.

Models v0: STRICT1 and READ1_WRITE1

STRICT1: within one tick, each slot can participate in at most one transfer, either as source or destination.

READ1_WRITE1: within one tick, each slot may send at most once and receive at most once.

This is a useful baseline for:

  • endpoint-limited transfer systems;
  • shuffle/staging/replication analysis;
  • CI regression gates;
  • comparing scheduler strategies;
  • first customer audits where full topology is not yet modeled.

It is not a universal network model. For real pilots, confirm whether the client needs extensions such as READ1_WRITE1, broadcast, topology-aware bandwidth, asymmetric links or tier-aware storage constraints.

Commands

Check local pilot readiness

copyspace-guard --version
copyspace-guard doctor --root .
copyspace-guard doctor --root . --json

Analyze CSV and generate reports

copyspace-guard analyze --csv INPUT.csv --bw 256 --outdir artifacts/run

Optional:

--slots N
--id workload-name
--notes "free text"
--cost-per-tick 0.02
--model STRICT1  # or READ1_WRITE1
--bounds-subset-limit 20
--max-errors 100
--max-demands 100000
--max-slots 10000
--max-output-ticks 1000000

--bounds-subset-limit controls exhaustive STRICT1 subset-density enumeration and is protected by a hard cap to avoid accidental exponential runs.

Validate a schedule

copyspace-guard validate artifacts/run/instance.json artifacts/run/schedule_greedy.json --report artifacts/run/validation.json

Regenerate Markdown/HTML reports

copyspace-guard report artifacts/run/summary.json --outdir artifacts/report

Validate generated artifact contracts

copyspace-guard validate-artifact --kind summary artifacts/run/summary.json

Run production-oriented checks

make test
make security
make production-check

make test runs ruff, mypy, compileall, unit/property/CLI tests, coverage and a CI gate smoke. make security runs Bandit over src/tools and pip-audit over the Python environment. make production-check runs release checks plus a small synthetic performance suite. The suite can also be run directly:

copyspace-guard bench-suite --outdir artifacts/bench-suite --max-total-seconds 30

Customer/current schedule input

For a real pilot, the customer may already have an actual schedule. Use CSV:

tick,src_slot,dst_slot,len_bits
0,0,1,256
0,2,3,256
1,1,2,256

Then run:

copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --current-schedule-csv customer_schedule.csv \
  --outdir artifacts/customer-run

You can also convert a schedule CSV to JSON:

copyspace-guard schedule-csv-to-json --csv customer_schedule.csv --out schedule.json

CI gate command

After analyze, fail/pass thresholds can be checked locally or in CI:

copyspace-guard gate artifacts/demo/summary.json \
  --report greedy \
  --max-gap 0.15 \
  --min-utilization 0.85

Exit code 0 means pass, exit code 2 means fail.

Files generated by analyze

  • instance.json — normalized workload contract.
  • schedule_baseline.json or schedule_customer_current.json — current schedule artifact, unless --summary-only is used.
  • schedule_greedy.json — deterministic candidate schedule, unless --summary-only is used.
  • schedule_baseline.csv or schedule_customer_current.csv — CSV schedule artifact, unless --summary-only is used.
  • schedule_greedy.csv — deterministic candidate schedule CSV, unless --summary-only is used.
  • report_baseline.json or report_customer_current.json — validation metrics for the current schedule.
  • report_greedy.json — validation metrics for candidate.
  • summary.json — machine-readable comparison summary.
  • report.md — human-readable audit report.
  • report.html — shareable report for demos and sales calls.

v0.2.3 boundaries

Included:

  • volume-based demand modeling;
  • deterministic baseline and greedy schedules;
  • STRICT1 and READ1_WRITE1 validators;
  • lower-bound gap and utilization metrics;
  • ROI estimates via roi.yml or a simple $ per tick assumption;
  • sales-ready report artifacts;
  • PyPI publishing through GitHub Actions Trusted Publishing;
  • matrix CI for Python 3.10, 3.11 and 3.12;
  • required CI checks for tests, build, Docker smoke and security scans;
  • release version guard for tag/version synchronization;
  • Dependabot automation for GitHub Actions updates.

Not included yet:

  • topology/path selection;
  • real transfer execution;
  • cloud adapter importers;
  • address-level offset validation;
  • VCopySpace receipt ledger integration;
  • topology/path-specific CSV importers beyond the current demand and schedule formats.

Known operational caveats:

  • Customer schedule CSVs used in streaming mode must be sorted by non-decreasing tick.
  • Full artifact mode can produce large schedule JSON/CSV files; use --summary-only for large pilots and CI.
  • For large STRICT1 slot counts, subset-density lower bounds may be partial; check bounds_complete in reports.
  • The greedy schedule is deterministic and useful for comparison, but it is not a proof of global optimality.
  • Demand and schedule core fields are parsed as integers. Pass-through text columns in anonymized CSV outputs are prefixed with a single quote when they begin with spreadsheet formula trigger characters (=, +, -, @, tab or carriage return).

How this maps to the larger project set

  • copy-space → scheduler, validator, lower-bound gap, CI-gate idea.
  • vcopyspace → future enterprise layer: receipt-based metering, ledger, trace/replay, cost model.
  • DDAS → long-term deterministic state-transition foundation.

ROI mode

Turn saved ticks into business impact:

copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/demo

Example examples/roi.yml:

roi:
  tick_seconds: 1
  gpu_count_blocked: 64
  gpu_hour_cost_usd: 2.50
  runs_per_day: 12
  days_per_month: 30

Gate config file

copyspace-guard gate artifacts/demo/summary.json \
  --config examples/copyspace_guard.yml

Example config:

gates:
  report: greedy
  max_gap_to_lower_bound: 0.15
  min_utilization: 0.85

Release automation

Tag releases are published to GitHub Releases and PyPI. Before a tag publishes, the release workflow verifies that the tag version matches both pyproject.toml and copyspace_guard.__version__.

Prepare a version bump locally:

VERSION=0.2.3 NOTE="Short release note" make bump-version
TAG=v0.2.3 make release-guard

GitHub release notes are autogenerated from merged pull requests. See docs/RELEASE_PROCESS.md for the full process and PyPI Trusted Publishing configuration.

Docker

docker build -t copyspace-guard .
docker run --rm --user "$(id -u):$(id -g)" -v "$PWD:/work" copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/docker-demo

Industry demos

copyspace-guard analyze --csv examples/ai_checkpoint.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/ai-checkpoint
copyspace-guard analyze --csv examples/db_shuffle.csv --bw 262144 --roi examples/roi.yml --outdir artifacts/db-shuffle
copyspace-guard analyze --csv examples/storage_replication.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/storage-replication
copyspace-guard analyze --csv examples/kv_cache_movement.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/kv-cache

Client package

See client-package/ for a minimal package that can be sent to a pilot customer:

  • README_CLIENT.md
  • sample_demands.csv
  • sample_schedule.csv
  • roi.yml
  • copyspace_guard.yml
  • run_local.sh
  • intake.md

Anonymize demands or schedules

copyspace-guard anonymize \
  --kind demands \
  --csv raw_demands.csv \
  --out anonymized_demands.csv \
  --mapping slot_mapping.json

copyspace-guard anonymize \
  --kind schedule \
  --csv raw_schedule.csv \
  --out anonymized_schedule.csv \
  --mapping-in slot_mapping.json \
  --mapping schedule_slot_mapping.json

Use --mapping-in when anonymizing demands and schedules that must share the same slot-ID mapping. Do not share mapping.json unless you intend to reveal the original endpoint names.

Sales-oriented demos

Bad current schedule vs candidate:

copyspace-guard analyze   --csv examples/demo_bad_current_demands.csv   --bw 256   --current-schedule-csv examples/demo_bad_current_schedule.csv   --roi examples/roi.yml   --outdir artifacts/bad-current-demo

Conflict detection:

copyspace-guard analyze   --csv examples/demo_conflict_demands.csv   --bw 256   --current-schedule-csv examples/demo_conflict_schedule.csv   --summary-only   --outdir artifacts/conflict-demo

Large workloads can use --summary-only to avoid writing full schedule JSON/CSV artifacts. In this mode generated baseline/candidate schedules are streamed into the validator instead of materialized in memory. Customer schedule CSVs used in streaming mode must be sorted by non-decreasing tick.

Model and bound details

  • Model limitations: docs/MODEL_LIMITATIONS.md
  • Lower-bound definitions: docs/BOUNDS.md
  • JSON schemas: docs/SCHEMAS.md
  • Artifact contracts: docs/ARTIFACT_CONTRACTS.md
  • Performance notes: docs/PERFORMANCE.md
  • Pilot readiness: docs/PILOT_READINESS.md
  • Production readiness: docs/PRODUCTION_READINESS.md
  • Operations guide: docs/OPERATIONS.md
  • Release process: docs/RELEASE_PROCESS.md
  • Threat model: docs/THREAT_MODEL.md
  • Data handling: docs/DATA_HANDLING.md
  • Changelog: CHANGELOG.md

Benchmark

copyspace-guard bench   --slots 64   --bits-per-edge 1048576   --bw 1048576   --outdir artifacts/bench

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copyspace_guard-0.2.3.tar.gz (96.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

copyspace_guard-0.2.3-py3-none-any.whl (35.6 kB view details)

Uploaded Python 3

File details

Details for the file copyspace_guard-0.2.3.tar.gz.

File metadata

  • Download URL: copyspace_guard-0.2.3.tar.gz
  • Upload date:
  • Size: 96.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copyspace_guard-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3eaa25f96b5e4b5c0075c09e48c96836d0462677190b3f199822c5aa1618157c
MD5 cbd2eee268d67bab631e21ab7afdbe72
BLAKE2b-256 610f3cbbfae51ec04cfca38564c003582466685e9262d060367407aff54b96fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for copyspace_guard-0.2.3.tar.gz:

Publisher: release.yml on bortoq/copyspace-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file copyspace_guard-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: copyspace_guard-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copyspace_guard-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bf9a89a3d62bb502e4104b350011e43225c0eefb13b735e410c805b05dba744e
MD5 747f6df5d382c938069cc01cf5ee3025
BLAKE2b-256 98eb87660509a0834e318590e55f0b63ebfa612625908db6be0d916b71e941ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for copyspace_guard-0.2.3-py3-none-any.whl:

Publisher: release.yml on bortoq/copyspace-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page