Seedbraid reference-based reconstruction with CDC and IPFS seed transport
Project description
Seedbraid
Seedbraid provides reference-based reconstruction with deterministic content-defined chunking (CDC), a binary SBD1 seed format, and IPFS publish/fetch transport.
Beta Status (Read First)
- Seedbraid is currently in beta stage.
- Before production use, run strict validation in your own runtime/storage/network environment.
- Treat successful
verify --strictand bit-perfect restore checks as release gates for your team.
Strict Validation Workflow (Required Before Production)
Run the following smoke workflow before relying on Seedbraid in CI/CD or production pipelines:
uv sync --no-editable --extra dev
workdir="$(mktemp -d)"
python3 - <<'PY' "$workdir/input.bin"
from pathlib import Path
import sys
out = Path(sys.argv[1])
payload = (b"seedbraid-beta-smoke" * 20000) + bytes(range(256)) * 200
out.write_bytes(payload)
print(f"wrote {out} bytes={len(payload)}")
PY
uv run --no-sync --no-editable seedbraid encode "$workdir/input.bin" \
--genome "$workdir/genome" \
--out "$workdir/seed.sbd" \
--chunker cdc_buzhash \
--avg 65536 --min 16384 --max 262144 \
--learn --portable --compression zlib
uv run --no-sync --no-editable seedbraid verify "$workdir/seed.sbd" \
--genome "$workdir/genome" \
--strict
uv run --no-sync --no-editable seedbraid decode "$workdir/seed.sbd" \
--genome "$workdir/genome" \
--out "$workdir/decoded.bin"
cmp -s "$workdir/input.bin" "$workdir/decoded.bin" \
&& echo "bit-perfect roundtrip: OK"
UV_CACHE_DIR=.uv-cache uv run --no-sync --no-editable ruff check .
PYTHONPATH=src UV_CACHE_DIR=.uv-cache uv run --no-sync --no-editable python -m pytest
Features
- Lossless encode/decode with SHA-256 verification.
- Chunkers:
fixed,cdc_buzhash,cdc_rabin. - Genome storage (SQLite) for deduplicated chunk reuse.
- SBD1 binary seed container (
manifest + recipe + optional RAW + integrity). - IPFS CLI integration (
publish,fetch). - Optional remote pin integration (
pin remote-add, publish-time remote pin).
Why Seedbraid
- Seed-first architecture: reconstruction intent is shipped as a compact
SBD1seed (manifest + recipe) instead of shipping full blobs repeatedly. - End-to-end integrity posture: strict verify mode, compatibility fixtures, and performance gates are built into the project workflow.
- Practical Web3 distribution: CID publish/fetch is part of the same CLI surface as encode/decode, reducing operational handoffs.
- Shift-resilient dedup by default: CDC is first-class and benchmarked against fixed chunking with reproducible scripts.
Best-Fit Use Cases
- Large binary versioning: datasets, ML models, media assets, and VM images.
- Distribution of many similar files: share a common genome and distribute compact seeds.
- IPFS-based distribution and retrieval: distribute by CID and verify reconstruction integrity.
- Shift-heavy changes (for example, single-byte insertion): CDC improves reuse over fixed chunking.
What It Takes for OSS Adoption
- A 5-minute onboarding path (installation + first encode/decode tutorial).
- Benchmark evidence that Seedbraid wins against alternatives on size, transfer time, and restore speed.
- Security and operations readiness: signing/encryption and operator tooling (
doctor,snapshot,restore). - Stable format governance and backward-compatibility policy for long-lived seed archives.
Installation
Note: PyPI publishing is currently on hold.
pip install seedbraidis not yet available. Please install from source.
Quick Start
uv sync --no-editable --extra dev
Optional zstd support:
uv sync --no-editable --extra dev --extra zstd
Refresh lockfile after dependency changes:
uv lock
Generate Encryption Key
Generate a high-entropy key for SB_ENCRYPTION_KEY:
uv run --no-editable seedbraid gen-encryption-key
Print shell export format:
uv run --no-editable seedbraid gen-encryption-key --shell
Set current shell variable directly:
eval "$(uv run --no-editable seedbraid gen-encryption-key --shell)"
CLI
Encode
uv run --no-editable seedbraid encode input.bin --genome ./genome --out seed.sbd \
--chunker cdc_buzhash --avg 65536 --min 16384 --max 262144 \
--learn --no-portable --compression zlib
uv run --no-editable seedbraid encode input.bin --genome ./genome --out seed.private.sbd \
--manifest-private
export SB_ENCRYPTION_KEY='your-secret-passphrase'
uv run --no-editable seedbraid encode input.bin --genome ./genome --out seed.encrypted.sbd \
--encrypt --manifest-private
Decode
uv run --no-editable seedbraid decode seed.sbd --genome ./genome --out recovered.bin
uv run --no-editable seedbraid decode seed.encrypted.sbd --genome ./genome --out recovered.bin \
--encryption-key "$SB_ENCRYPTION_KEY"
Verify
uv run --no-editable seedbraid verify seed.sbd --genome ./genome
uv run --no-editable seedbraid verify seed.sbd --genome ./genome --strict
uv run --no-editable seedbraid verify seed.sbd --genome ./genome --require-signature --signature-key "$SB_SIGNING_KEY"
uv run --no-editable seedbraid verify seed.encrypted.sbd --genome ./genome --strict \
--encryption-key "$SB_ENCRYPTION_KEY"
verify supports two modes:
- Quick mode (default): checks seed integrity and required chunk availability.
- Strict mode (
--strict): reconstructs all content and enforces source size and SHA-256 match.
Prime
uv run --no-editable seedbraid prime "./dataset/**/*" --genome ./genome --chunker cdc_buzhash
Genome Snapshot / Restore
uv run --no-editable seedbraid genome snapshot --genome ./genome --out genome.sgs
uv run --no-editable seedbraid genome restore genome.sgs --genome ./genome-dr --replace
Publish (IPFS)
uv run --no-editable seedbraid publish seed.sbd --no-pin
uv run --no-editable seedbraid publish seed.sbd --pin
uv run --no-editable seedbraid publish seed.sbd --remote-pin \
--remote-endpoint https://pin.example/api/v1 --remote-token "$SB_PINNING_TOKEN"
publish emits a warning when seed is unencrypted. For sensitive data, prefer:
seedbraid encode --encrypt --manifest-private ... before publishing.
When --remote-pin is enabled, Seedbraid also registers CID with configured remote
pin provider (Pinning Services API-compatible).
Fetch (IPFS)
uv run --no-editable seedbraid fetch <cid> --out fetched.sbd
uv run --no-editable seedbraid fetch <cid> --out fetched.sbd --retries 5 --backoff-ms 300
uv run --no-editable seedbraid fetch <cid> --out fetched.sbd --gateway https://ipfs.io/ipfs
fetch retries ipfs cat with exponential backoff and can fallback to an HTTP gateway.
Pin Health (IPFS)
uv run --no-editable seedbraid pin-health <cid>
Remote Pin Add (IPFS)
export SB_PINNING_ENDPOINT='https://pin.example/api/v1'
export SB_PINNING_TOKEN='your-api-token'
uv run --no-editable seedbraid pin remote-add <cid>
Doctor
uv run --no-editable seedbraid doctor --genome ./genome
doctor checks:
- Python runtime compatibility (>=3.12)
- IPFS CLI availability/version
IPFS_PATHstate- genome path writability
- compression support (
zlib, optionalzstd)
Sign Seed (optional)
export SB_SIGNING_KEY='your-shared-secret'
uv run --no-editable seedbraid sign seed.sbd --out seed.signed.sbd --key-env SB_SIGNING_KEY --key-id team-a
Export / Import Genes (optional)
uv run --no-editable seedbraid export-genes seed.sbd --genome ./genome --out genes.pack
uv run --no-editable seedbraid import-genes genes.pack --genome ./another-genome
IPFS Installation/Check
Check if IPFS CLI is available:
ipfs --version
If missing, install Kubo (IPFS CLI) and ensure ipfs is on your PATH.
Common Failures
ipfs CLI not found:- Install IPFS and verify with
ipfs --version.
- Install IPFS and verify with
Missing required chunkon decode/verify:- Provide the correct
--genome, or re-encode with--portable.
- Provide the correct
zstdcompression error:- Install optional dependency
zstandard, or use--compression zlib.
- Install optional dependency
Troubleshooting Matrix
| Symptom | Error Code | Next Action |
|---|---|---|
| Encryption requested but key missing | SB_E_ENCRYPTION_KEY_MISSING |
Pass --encryption-key or set SB_ENCRYPTION_KEY. |
| Signing requested but key missing | SB_E_SIGNING_KEY_MISSING |
Export signing key env var and retry seedbraid sign. |
| IPFS CLI missing | SB_E_IPFS_NOT_FOUND |
Install Kubo and confirm ipfs --version. |
| IPFS fetch/publish failure | SB_E_IPFS_FETCH / SB_E_IPFS_PUBLISH |
Check daemon/network, retry, use gateway fallback if needed. |
| Remote pin configuration missing | SB_E_REMOTE_PIN_CONFIG |
Set endpoint/token env vars or pass options. |
| Remote pin auth failed | SB_E_REMOTE_PIN_AUTH |
Verify provider token permissions and retry. |
| Remote pin request invalid | SB_E_REMOTE_PIN_REQUEST |
Check CID/provider options and retry. |
| Remote pin timeout/failure | SB_E_REMOTE_PIN_TIMEOUT / SB_E_REMOTE_PIN |
Increase retries/timeout or check provider health. |
| Seed parse/integrity failure | SB_E_SEED_FORMAT |
Re-fetch/rebuild seed and verify source integrity. |
CI (SBD-ECO-001)
GitHub Actions workflows:
.github/workflows/ci.yml- Lint:
ruff check . - Test:
python -m pytest - Compatibility fixtures:
python -m pytest tests/test_compat_fixtures.py - Benchmark gate:
python scripts/bench_gate.py ...
- Lint:
.github/workflows/publish-seed.yml(manual only,dry_run=truedefault)- Generates seed from
source_pathviaseedbraid encode - Runs strict integrity check via
seedbraid verify --strict - Publishes to IPFS only when
dry_run=false - Installs Kubo (
ipfsCLI) on runner whendry_run=false(version configurable viakubo_version) - Verifies Kubo release tag signature status via GitHub API before install
- Verifies downloaded Kubo archive checksum (
sha512) before extraction - Supports
pin,portable,manifest_private, and optionalencrypt(SB_ENCRYPTION_KEYsecret required whenencrypt=true)
- Generates seed from
Local parity commands:
uv sync --no-editable --extra dev
uv run --no-sync --no-editable ruff check .
PYTHONPATH=src uv run --no-sync --no-editable python -m pytest
PYTHONPATH=src uv run --no-sync --no-editable python -m pytest tests/test_compat_fixtures.py
uv run --no-sync --no-editable python scripts/bench_gate.py \
--min-reuse-improvement-bps 1 \
--max-seed-size-ratio 1.20 \
--min-cdc-throughput-mib-s 0.10 \
--json-out .artifacts/bench-report.json
DVC Integration (SBD-ECO-003)
- Minimal DVC bridge lives in
examples/dvc/. - Pipeline stages are
encode -> verify --strict -> fetch. verifystage is strict and must fail pipeline reproduction on integrity mismatch.- Integration recipe and artifact layout are documented in
examples/dvc/README.md.
OCI Integration (SBD-ECO-004)
- ORAS bridge scripts and usage docs live in
examples/oci/. - Default OCI metadata convention:
- artifact type:
application/vnd.seedbraid.seed.v1 - layer media type:
application/vnd.seedbraid.seed.layer.v1+sbd - annotations: source SHA-256, chunker, manifest-private flag, seed title
- artifact type:
- Push/pull scripts:
examples/oci/scripts/push_seed.sh <seed.sbd> <registry/repository:tag>examples/oci/scripts/pull_seed.sh <registry/repository:tag> <out.sbd>
- After pull, run strict verification:
seedbraid verify <out.sbd> --genome <genome-path> --strict
ML Tooling Hooks (SBD-ECO-005)
- Scripts for MLflow metadata logging and Hugging Face upload live in
examples/ml/. - MLflow hook logs seed metadata fields (seed digest, manifest provenance, optional transport refs).
- Hugging Face hook uploads
seed.sbd+ metadata sidecar with env-provided token credentials. - Restore workflow from logged metadata is documented in
examples/ml/README.md.
Tests and CI-Equivalent Local Commands
uv run --no-editable ruff check .
uv run --no-editable python -m pytest
uv run --no-editable python -m pytest tests/test_compat_fixtures.py
IPFS tests auto-skip when ipfs is not installed.
Compatibility fixtures are stored in tests/fixtures/compat/v1/ and are
validated by tests/test_compat_fixtures.py.
Regenerate intentionally with:
uv run --no-editable python scripts/gen_compat_fixtures.py.
1-byte Insertion Dedup Benchmark
Run:
uv run --no-editable python scripts/bench_shifted_dedup.py
uv run --no-editable python scripts/bench_gate.py \
--min-reuse-improvement-bps 1 \
--max-seed-size-ratio 1.20 \
--min-cdc-throughput-mib-s 0.10 \
--json-out .artifacts/bench-report.json
Expected behavior:
cdc_buzhashshould show better reuse thanfixedwhen a single-byte insertion shifts offsets.bench_gate.pyexits non-zero when configured thresholds are violated.
Project Documents
- Format spec:
docs/FORMAT.md - Design rationale:
docs/DESIGN.md - Threat model:
docs/THREAT_MODEL.md - Error codes:
docs/ERROR_CODES.md - Performance gates:
docs/PERFORMANCE.md - DVC workflow bridge example:
examples/dvc/README.md - OCI/ORAS distribution example:
examples/oci/README.md - ML tooling hooks example:
examples/ml/README.md
Support Seedbraid
- Seedbraid is maintained as an open-source project.
- If Seedbraid helps your workflow, please consider donating via the repository
Sponsorbutton. - Donations directly support maintenance, documentation, and compatibility/performance validation.
Open Source Governance
- License:
MIT(LICENSE) - Security policy:
SECURITY.md - Contributing guide:
CONTRIBUTING.md - Code of Conduct:
CODE_OF_CONDUCT.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seedbraid-1.1.0.tar.gz.
File metadata
- Download URL: seedbraid-1.1.0.tar.gz
- Upload date:
- Size: 69.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
231de974c76c077b7577f951b4eee285f6045ecde8bb741675c9e4876852dbd7
|
|
| MD5 |
f85aab16281c2080a6438ee7519535ac
|
|
| BLAKE2b-256 |
bfa0743a61e7080d0dd31b58fc85cf543488a1d4d171c40996d7019b9311c085
|
Provenance
The following attestation bundles were made for seedbraid-1.1.0.tar.gz:
Publisher:
release.yml on aimsise/seedbraid
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seedbraid-1.1.0.tar.gz -
Subject digest:
231de974c76c077b7577f951b4eee285f6045ecde8bb741675c9e4876852dbd7 - Sigstore transparency entry: 1066568817
- Sigstore integration time:
-
Permalink:
aimsise/seedbraid@86e9ccb5bfe4df51f3f4f1d1f841ba24b5e139a9 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/aimsise
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@86e9ccb5bfe4df51f3f4f1d1f841ba24b5e139a9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file seedbraid-1.1.0-py3-none-any.whl.
File metadata
- Download URL: seedbraid-1.1.0-py3-none-any.whl
- Upload date:
- Size: 51.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcb9f0c8432eba4ed462dac7d981a6b0a70a3b3bbaf6e32d7b7d14c9f18d52a9
|
|
| MD5 |
03c969ff5c24274ba5674a48ffa6d661
|
|
| BLAKE2b-256 |
2c8fd088e07311e1ee40a42def7cccf664bf2884b9f1dea6d3105ead420da8de
|
Provenance
The following attestation bundles were made for seedbraid-1.1.0-py3-none-any.whl:
Publisher:
release.yml on aimsise/seedbraid
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seedbraid-1.1.0-py3-none-any.whl -
Subject digest:
fcb9f0c8432eba4ed462dac7d981a6b0a70a3b3bbaf6e32d7b7d14c9f18d52a9 - Sigstore transparency entry: 1066568820
- Sigstore integration time:
-
Permalink:
aimsise/seedbraid@86e9ccb5bfe4df51f3f4f1d1f841ba24b5e139a9 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/aimsise
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@86e9ccb5bfe4df51f3f4f1d1f841ba24b5e139a9 -
Trigger Event:
push
-
Statement type: