Skip to main content

Deterministic data-movement audit, validation and optimization reports

Project description

Copy-Space Guard

CI Coverage License Python

Copy-Space Guard is a metadata-only CLI for deterministic data-movement audits and CI regression gates. Status: production-oriented pilot / v0.2.

It takes a transfer demand matrix (src_slot,dst_slot,bits_total), validates schedules under a declared resource model, compares a baseline or customer schedule against a deterministic greedy candidate, and produces sales/engineering reports with lower-bound gap, utilization and estimated savings.

This package is intentionally small and pilot-friendly:

  • no external Python dependencies;
  • no payload data required;
  • deterministic output for CI and regression tracking;
  • machine-readable JSON plus human-readable Markdown/HTML reports.

Copy-Space Guard report preview

Product promise

Give us one transfer trace or demand matrix. In a few days we show whether your data-movement plan is conflict-free, how far it is from a deterministic lower bound, and what CI gate can prevent future regressions.

Quickstart

From this directory:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e .
copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --id ai-staging-ring15 \
  --roi examples/roi.yml \
  --outdir artifacts/demo

Open:

  • artifacts/demo/report.html
  • artifacts/demo/report.md
  • artifacts/demo/summary.json

Expected terminal shape:

baseline: status=PASS ticks=768 lb=549 gap=0.398907 util=0.7143
greedy:   status=PASS ticks=549 lb=549 gap=0.000000 util=0.9992
saved_ticks=219 estimated_savings=9.73

The exact numbers depend on the input CSV and bandwidth value.

Input format

CSV with header:

src_slot,dst_slot,bits_total
0,1,65536
1,2,65536

Meaning:

  • src_slot — source endpoint ID;
  • dst_slot — destination endpoint ID;
  • bits_total — transfer volume from source to destination.

Duplicate pairs are automatically merged.

Models v0: STRICT1 and READ1_WRITE1

STRICT1: within one tick, each slot can participate in at most one transfer, either as source or destination.

READ1_WRITE1: within one tick, each slot may send at most once and receive at most once.

This is a useful baseline for:

  • endpoint-limited transfer systems;
  • shuffle/staging/replication analysis;
  • CI regression gates;
  • comparing scheduler strategies;
  • first customer audits where full topology is not yet modeled.

It is not a universal network model. For real pilots, confirm whether the client needs extensions such as READ1_WRITE1, broadcast, topology-aware bandwidth, asymmetric links or tier-aware storage constraints.

Commands

Check local pilot readiness

copyspace-guard --version
copyspace-guard doctor --root .
copyspace-guard doctor --root . --json

Analyze CSV and generate reports

copyspace-guard analyze --csv INPUT.csv --bw 256 --outdir artifacts/run

Optional:

--slots N
--id workload-name
--notes "free text"
--cost-per-tick 0.02
--model STRICT1  # or READ1_WRITE1
--bounds-subset-limit 20
--max-errors 100
--max-demands 100000
--max-slots 10000
--max-output-ticks 1000000

--bounds-subset-limit controls exhaustive STRICT1 subset-density enumeration and is protected by a hard cap to avoid accidental exponential runs.

Validate a schedule

copyspace-guard validate artifacts/run/instance.json artifacts/run/schedule_greedy.json --report artifacts/run/validation.json

Regenerate Markdown/HTML reports

copyspace-guard report artifacts/run/summary.json --outdir artifacts/report

Validate generated artifact contracts

copyspace-guard validate-artifact --kind summary artifacts/run/summary.json

Run production-oriented checks

make production-check

This runs release checks plus a small synthetic performance suite. The suite can also be run directly:

copyspace-guard bench-suite --outdir artifacts/bench-suite --max-total-seconds 30

Customer/current schedule input

For a real pilot, the customer may already have an actual schedule. Use CSV:

tick,src_slot,dst_slot,len_bits
0,0,1,256
0,2,3,256
1,1,2,256

Then run:

copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --current-schedule-csv customer_schedule.csv \
  --outdir artifacts/customer-run

You can also convert a schedule CSV to JSON:

copyspace-guard schedule-csv-to-json --csv customer_schedule.csv --out schedule.json

CI gate command

After analyze, fail/pass thresholds can be checked locally or in CI:

copyspace-guard gate artifacts/demo/summary.json \
  --report greedy \
  --max-gap 0.15 \
  --min-utilization 0.85

Exit code 0 means pass, exit code 2 means fail.

Files generated by analyze

  • instance.json — normalized workload contract.
  • schedule_baseline.json or schedule_customer_current.json — current schedule artifact, unless --summary-only is used.
  • schedule_greedy.json — deterministic candidate schedule, unless --summary-only is used.
  • schedule_baseline.csv or schedule_customer_current.csv — CSV schedule artifact, unless --summary-only is used.
  • schedule_greedy.csv — deterministic candidate schedule CSV, unless --summary-only is used.
  • report_baseline.json or report_customer_current.json — validation metrics for the current schedule.
  • report_greedy.json — validation metrics for candidate.
  • summary.json — machine-readable comparison summary.
  • report.md — human-readable audit report.
  • report.html — shareable report for demos and sales calls.

v0.2 boundaries

Included:

  • volume-based demand modeling;
  • deterministic baseline and greedy schedules;
  • STRICT1 and READ1_WRITE1 validators;
  • lower-bound gap and utilization metrics;
  • ROI estimates via roi.yml or a simple $ per tick assumption;
  • sales-ready report artifacts.

Not included yet:

  • production security hardening;
  • topology/path selection;
  • real transfer execution;
  • cloud adapter importers;
  • address-level offset validation;
  • VCopySpace receipt ledger integration.

Known operational caveats:

  • Customer schedule CSVs used in streaming mode must be sorted by non-decreasing tick.
  • Full artifact mode can produce large schedule JSON/CSV files; use --summary-only for large pilots and CI.
  • For large STRICT1 slot counts, subset-density lower bounds may be partial; check bounds_complete in reports.
  • The greedy schedule is deterministic and useful for comparison, but it is not a proof of global optimality.

How this maps to the larger project set

  • copy-space → scheduler, validator, lower-bound gap, CI-gate idea.
  • vcopyspace → future enterprise layer: receipt-based metering, ledger, trace/replay, cost model.
  • DDAS → long-term deterministic state-transition foundation.

ROI mode

Turn saved ticks into business impact:

copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/demo

Example examples/roi.yml:

roi:
  tick_seconds: 1
  gpu_count_blocked: 64
  gpu_hour_cost_usd: 2.50
  runs_per_day: 12
  days_per_month: 30

Gate config file

copyspace-guard gate artifacts/demo/summary.json \
  --config examples/copyspace_guard.yml

Example config:

gates:
  report: greedy
  max_gap_to_lower_bound: 0.15
  min_utilization: 0.85

Docker

docker build -t copyspace-guard .
docker run --rm --user "$(id -u):$(id -g)" -v "$PWD:/work" copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/docker-demo

Industry demos

copyspace-guard analyze --csv examples/ai_checkpoint.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/ai-checkpoint
copyspace-guard analyze --csv examples/db_shuffle.csv --bw 262144 --roi examples/roi.yml --outdir artifacts/db-shuffle
copyspace-guard analyze --csv examples/storage_replication.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/storage-replication
copyspace-guard analyze --csv examples/kv_cache_movement.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/kv-cache

Client package

See client-package/ for a minimal package that can be sent to a pilot customer:

  • README_CLIENT.md
  • sample_demands.csv
  • sample_schedule.csv
  • roi.yml
  • copyspace_guard.yml
  • run_local.sh
  • intake.md

Anonymize demands or schedules

copyspace-guard anonymize \
  --kind demands \
  --csv raw_demands.csv \
  --out anonymized_demands.csv \
  --mapping slot_mapping.json

copyspace-guard anonymize \
  --kind schedule \
  --csv raw_schedule.csv \
  --out anonymized_schedule.csv \
  --mapping-in slot_mapping.json \
  --mapping schedule_slot_mapping.json

Use --mapping-in when anonymizing demands and schedules that must share the same slot-ID mapping. Do not share mapping.json unless you intend to reveal the original endpoint names.

Sales-oriented demos

Bad current schedule vs candidate:

copyspace-guard analyze   --csv examples/demo_bad_current_demands.csv   --bw 256   --current-schedule-csv examples/demo_bad_current_schedule.csv   --roi examples/roi.yml   --outdir artifacts/bad-current-demo

Conflict detection:

copyspace-guard analyze   --csv examples/demo_conflict_demands.csv   --bw 256   --current-schedule-csv examples/demo_conflict_schedule.csv   --summary-only   --outdir artifacts/conflict-demo

Large workloads can use --summary-only to avoid writing full schedule JSON/CSV artifacts. In this mode generated baseline/candidate schedules are streamed into the validator instead of materialized in memory. Customer schedule CSVs used in streaming mode must be sorted by non-decreasing tick.

Model and bound details

  • Model limitations: docs/MODEL_LIMITATIONS.md
  • Lower-bound definitions: docs/BOUNDS.md
  • JSON schemas: docs/SCHEMAS.md
  • Artifact contracts: docs/ARTIFACT_CONTRACTS.md
  • Performance notes: docs/PERFORMANCE.md
  • Pilot readiness: docs/PILOT_READINESS.md
  • Production readiness: docs/PRODUCTION_READINESS.md
  • Operations guide: docs/OPERATIONS.md
  • Release process: docs/RELEASE_PROCESS.md
  • Threat model: docs/THREAT_MODEL.md
  • Data handling: docs/DATA_HANDLING.md
  • Changelog: CHANGELOG.md

Benchmark

copyspace-guard bench   --slots 64   --bits-per-edge 1048576   --bw 1048576   --outdir artifacts/bench

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copyspace_guard-0.2.2.tar.gz (91.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

copyspace_guard-0.2.2-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file copyspace_guard-0.2.2.tar.gz.

File metadata

  • Download URL: copyspace_guard-0.2.2.tar.gz
  • Upload date:
  • Size: 91.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copyspace_guard-0.2.2.tar.gz
Algorithm Hash digest
SHA256 ab5712203ea5d127e8a69e4b6f60934422ca519424aa1124a67e5227d377d07c
MD5 8555037035ac7c8ccc3a724acd45cd32
BLAKE2b-256 4ea031bd8c24f4bbacd98407ee63c2fc5a6a2fa5bebb01c6d69b9400a21f97a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for copyspace_guard-0.2.2.tar.gz:

Publisher: release.yml on bortoq/copyspace-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file copyspace_guard-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: copyspace_guard-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copyspace_guard-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e847fa9581e1e4e688b8e4eb591fc26a3a426c5e78f4fbca8373a7cb471bc4c1
MD5 370bb77d3739a09fe7af3db0ea7de1c1
BLAKE2b-256 ccf8af27920b5f0b1d17e1dbe3fec83ac84e856ef36eb1f6d809af9befc51f8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for copyspace_guard-0.2.2-py3-none-any.whl:

Publisher: release.yml on bortoq/copyspace-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page