Zer0pa Synthetic Biology / Metabolic Pathway Engineering Pipeline (Pipeline 4 of 6). Research infrastructure: predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims.
Project description
Synthetic-Biology
Live window into the Zer0pa lab. Synthetic Biology / Metabolic Pathway Engineering — Pipeline 4 of 6.
What This Is
In silico metabolic-pathway-engineering pipeline (L1→L7 + L4.5 + L5_OED) producing predicted pathways, KPIs, and SBOL3-attested genetic-modification specs as research artifacts.
The pipeline trains a Zer0pa-owned Conditional Enzyme Kinetics Model (CEKM) on real BRENDA / EnzyExtract / GotEnzymes2 / ProteinGym corpora, runs the four-tool L4 kinetics ensemble (DLKcat / CatPred / TurNuP / CEKM) plus eQuilibrator MDF / COBRApy FBA / FluxGAT essentiality, scaffolds the L4.5 unknown-enzyme path with RFdiffusion2
- MACE-OFF + ESMFold + ProDy + Genie-CAT, and emits SBOL3-attested
genetic-modification specifications via L6 host engineering. Every
adapter emits a
UniversalLayerEnvelopewhose 23-falsifier registry, boundary-block sha256, and license-class enforcement are first-class audit invariants.
The Human Milk Oligosaccharide (HMO) seed triple — 2'-fucosyllactose,
3'-sialyllactose, and disialyllacto-N-tetraose in E. coli iML1515 —
is the validation triple. Pre-registered acceptance thresholds
(validation/hmo-seed-evidence/<seed>/acceptance.yaml) are the
binding numerical gates; structural envelope-chain conformance passes
3/3 today.
Pipeline Mechanics
| Field | Value |
|---|---|
| Architecture | METABOLIC_PATHWAY_PIPELINE (L1 ZPE → L2 LIRC → L3 retrosynthesis → L3.5 ranking → L4 deep-eval → L4.5 unknown-enzyme → L5 MFMO → L5_OED → L6 host engineering → L6_BUILD cell-free TX-TL → L7 dossier) |
| Substrate | UniversalLayerEnvelope (Pydantic v2 + canonical-JSON sha256), SBOL3-attested L6 spec, PROV-O JSON-LD chain, DuckDB audit + GraphML/Cypher/RDF KG export |
| Execution | Mac CPU + H100 SXM 80 GB (autonomous orchestrator on Runpod, 10-phase chain with resume sentinels) |
| Toolchain | torch 2.2/cu130 + transformers (ESM-2 650M / ESMFold) + RFdiffusion2 (BSD-3) + MACE-OFF (medium) + equilibrator-pathway (MDF LP) + COBRApy + ripser/persim (TDA) + BoTorch (Hamming kernel + qLogNEHVI) + selfies + RDKit |
| Discipline | 23 falsifiers across 3 tiers (Tier-A fast / Tier-B medium / Tier-C heavy) + cross-model disagreement first-class + GPL-subprocess isolation (Salis RBS Calculator v1.0) + RESISTANCE.md anti-corruption protocol |
| Compute Status | v0.1 H100 chain end-to-end complete on Pod 1hx4ctwg1mpmxr 2026-05-03 (CEKM 20,000 fp32 steps; loss 6.93 → ~3.0, best 2.73 at step-19850; HMO triple + L4.5 inference + 19.2 GB CEKM push to HF emitted in same chain) |
Key Metrics
| Metric | Value | Baseline |
|---|---|---|
| CEKM_REAL_CORPUS_LOSS | 6.93 → ~3.0 (steps 0 → 20000; best 2.73 at step-19850) | total: 33,851 in-corpus rows + 5,961 held-out + 101,553 adversarial Tier α/β/γ negatives |
| AUTONOMOUS_CHAIN_PHASES | 10 / 10 complete | preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize (Pod 1hx4ctwg1mpmxr 2026-05-03 → 3b9744e) |
| HMO_TRIPLE_AUDIT_VERIFY | 3/3 PASS | conformance verifier per docs/synbio-audit-trail-v0.1-spec.md §10; DSLNT round-0 dossier envelope_count=11 from 2026-05-03 chain |
| CPU_PIPELINE_TESTS | 256 passing, 59 GPU-skipped | 0 regressions across CPU continuation A-H |
Source: PRD.md, FINAL-REPORT.md, FINAL-REPORT-RUNPOD.md, FINAL-REPORT-RUNPOD-AUTONOMOUS.md, validation/hmo-seed-evidence/, audit/runtime/runpod/.
Repo Identity
| Field | Value |
|---|---|
| Identifier | Synthetic-Biology |
| Repository | https://github.com/Zer0pa/Synthetic-Biology |
| Portfolio | Bio-Engineering |
| Visibility | INTERNAL |
| Default Branch | main |
| Authority Source | PRD.md (locked v1.0 decisions) |
| License | repository license file |
Readiness
| Field | Value |
|---|---|
| Evidence posture | v0.1 first full-budget H100 chain end-to-end complete; not a productized service |
| Checks | 256 passing tests + 23 falsifiers + 3/3 HMO seed audit-verify PASS |
| Custody boundary | 3 CEKM ckpts (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL) on HF Architect-Prime/synbio-cekm-v0.1; envelope chains + dossiers + L4.5 ESMFold PDBs + MACE-OFF binding ΔG JSONs in git |
| Confidence | scoped by Tier-A/B/C falsifier hierarchy; PathGym DBTL-holdout calibration deferred; CEKM calibration gate non-blocking by design (no BRENDA holdout in v0.1 corpus) |
| Authority | PRD.md (locked decisions); FINAL-REPORT-RUNPOD-AUTONOMOUS.md (chain receipts at 3b9744e); HANDOFF-CPU-CONTINUATION.md (CPU phase A-H record) |
Honest Blocker
CEKM v0.1 reached its 20,000-step target with checkpoints at step 1500 / 18000 / 19000 pushed to HF; this is a v0.1 research checkpoint, not a calibrated affinity predictor. Wet-lab Phase 2 dispatch is triple-gated and never on the cutover path. PathGym DBTL holdout calibration of TDA warning_score thresholds and L5 surrogate calibration scores is deferred to held-out post-experiment data. Real RFdiffusion2 motif-conditional designs require curated TS-mimetic geometry, downstream of v0.1; the v0.1 RFD2 wrapper additionally errored on run_inference.py not found (upstream layout drift across the 3 candidate paths the wrapper probes — non-blocking since ESMFold + MACE-OFF outputs landed for all 3 HMO seeds). BRENDA bulk download requires registration; v0.1 trains on EnzyExtract dark-matter + GotEnzymes2 + ProteinGym subsets, not full BRENDA core. CEKM Phase 40 calibration gate is non-blocking by design (sentinel-touched after eval ran cleanly against step-19000 ckpt; tier α/β/γ AUCs return None because no BRENDA holdout exists in this corpus).
What We Prove
- Real CEKM training on real corpus runs the full v0.1 budget end-to-end on H100 SXM (EnzyExtract 60K + GotEnzymes2 17K → 33K in-corpus + 6K held-out + 100K adversarial Tier α/β/γ negatives; loss curve 6.93 → ~3.0 over 20,000 fp32 steps, best 2.73 at step-19850; sustained 1.39 steps/s post-recovery; atomic-save + defensive
_latest_checkpointpatches survived ~6 mfs-quota-induced partial-write events without losing checkpoint integrity). - Autonomous H100 chain runs all 10 phases (preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize) end-to-end on Pod 1hx4ctwg1mpmxr 2026-05-03; phases 50–90 took 6m 32s wallclock after Phase 30's 3h training; emits real ESMFold PDBs for 7 enzymes across 3 HMO seeds + MACE-OFF binding ΔG JSONs + DSLNT round-0 dossier (envelope_count=11) + 19.2 GB CEKM checkpoint push to Hugging Face in 48s.
- HMO scientific-validation triple emits structurally complete L1→L7 envelope chains for 2'-fucosyllactose / 3'-sialyllactose / disialyllacto-N-tetraose;
synbio audit verifypasses 3/3 under the conformance verifier (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present on every L6 envelope, Class C/D/E license-grants enforced, cross-model disagreement records emitted, falsifier registry loaded). - L4B real eQuilibrator MDF on HMO precursor pathway: 2'-FL MDF=+6.78, 3'-SL +11.84, DSLNT +11.41 kJ/mol via
equilibrator_pathway.ThermodynamicModel.mdf_analysis()with per-compound optimal concentrations in the 1 μM – 10 mM physiological window. - L5 real BoTorch surrogate: GP per objective with custom Hamming-distance kernel +
qLogNoisyExpectedHypervolumeImprovement+ ASR-thermostable warm-starts (split-venv subprocess pattern; weights stay float32, autocast handles per-op casting; plug-replaceability invariant preserved across real-vs-stub paths). - TDA real fermentation simulator: 5-state Monod ODE via
scipy.integrate.solve_ivp(LSODA)covering all five PRD §5.3 failure modes (oxygen-transfer collapse / byproduct buildup / growth stall / toxicity threshold / nutrient depletion) with multi-channel ripser bottleneck + late-vs-early rate-of-change hybrid early-warning. - Synbio Audit-Trail Specification v0.1 (CC BY 4.0, Zer0pa-published): SBOL3 + PROV-O extension + canonical-JSON sha256 hash chain + Class A/B/C/D/E license-class enforcement + GPL-subprocess-isolation pattern (Salis RBS Calculator v1.0 binary wrapper, no Python
importof GPL modules).
What We Don't Claim
- This is not a clinical or human-subject pipeline. No diagnostic, therapeutic, or device claims.
- This is not a deployed industrial production system. No commercial titer guarantees.
- The CEKM v0.1 checkpoint is not a calibrated affinity predictor; it is a v0.1 research checkpoint trained for the full 20,000-step budget with bounded loss-decline evidence on a held-out partition. Tier α/β/γ AUCs are None because v0.1 has no BRENDA holdout.
- HMO predictions are advisory research artifacts, not regulatory submissions or product specifications. Wet-lab validation is operator-gated and never on the cutover path.
- The L4.5 unknown-enzyme path emits Tier-1 / Tier-2 / Tier-3 advisories per PRD §6.6; these are research suggestions, not enzyme designs warranting downstream synthesis without independent verification.
- No environmental release of GMOs. No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.
Verification Status
| Surface | Status | Evidence |
|---|---|---|
| Test suite | 256 passing, 59 GPU-skipped | pytest tests/ clean on Python 3.13 / macOS x86_64; CPU continuation A-H 0 regressions |
| Falsifier registry | 23 falsifiers across Tiers A/B/C, registry loads at module import | audit/falsifiers.yaml + src/zer0pa_synbio/falsifiers/checks.py (one CPU implementation per registry entry; deliberate-trigger test per falsifier) |
| HMO triple conformance | 3/3 PASS under synbio audit verify |
validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md + envelope chains 21/24/24 envelopes per seed |
| CEKM checkpoint custody | 3 ckpts on HF (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL + meta sha256-recorded) | https://huggingface.co/Architect-Prime/synbio-cekm-v0.1 (push 2026-05-03T03:46Z, 48s upload @ 3.43 GB/s) |
| Cutover invariance | 38 plug-replaceability / cutover-invariance tests | httpx.MockTransport golden-fixture suite forked from sibling-workstream Energy Wave 4 |
| Boundary discipline | Boundary block sha256-checked on every envelope; falsifier f000_boundary_violation enforces |
src/zer0pa_synbio/boundary.py + BOUNDARY.md |
Proof Anchors
- PRD.md — locked v1.0 spec; controlling decisions, layer contracts, falsifier registry, license discipline.
- audit/falsifiers.yaml — 23-falsifier registry with
id,tier,severity,gate_action. - validation/hmo-seed-evidence/ — pre-registered acceptance thresholds + envelope chains + dossiers + audit-verify reports for the 2'-FL / 3'-SL / DSLNT validation triple.
- docs/synbio-audit-trail-v0.1-spec.md — Zer0pa-published Synbio Audit-Trail Spec v0.1 (CC BY 4.0): SBOL3 + PROV-O + sha256 hash chain + license-class enforcement + GPL subprocess isolation.
- src/zer0pa_synbio/cekm/train.py — CEKM training entrypoint (real corpus path, adversarial-negatives sampler, atomic-save checkpoint, defensive resume that skips zero-byte/truncated meta).
- FINAL-REPORT-RUNPOD-AUTONOMOUS.md — chain receipts at commit
3b9744e: per-phase START/RETRY/DONE events, all 10 sentinels, HF push verification, L4.5 inference outputs.
Repo Shape
src/zer0pa_synbio/— adapters L1-L7, envelope, falsifiers, CEKM model + train + loaders, KG writer, audit writer, TDA simulator, runpod_inference, CLIaudit/— falsifiers.yaml, source_manifests/, license_grants/, runtime/ (gitignored except runpod state surface)validation/hmo-seed-evidence/— 2'-FL / 3'-SL / DSLNT triple with acceptance.yaml + dossier.json + envelope_chain.json + RESULT.md per seedkg/— schema.cypher + nodes.csv + edges.csv (Neo4j-shaped + GraphML/Cypher/RDF/Turtle export)tests/— 256 passing tests across contract / integration / falsification waves / cutover invariancedocs/— Synbio Audit-Trail Spec v0.1 (CC BY 4.0)scripts/runpod/— autonomous H100 SXM chain (bootstrap, orchestrator, heartbeat, watchdog, 10 phase scripts) + Mac-side wake-up watcher + corpus stagerconfigs/— wave4 real-corpus CEKM training + runpod orchestrator phase configfixtures/— LIRC slice + CEKM mini-fixtures + per-source manifests
Boundary
Research infrastructure for in silico synthetic biology / metabolic pathway engineering. Outputs are research artifacts — predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims. No clinical or human-subject use. No environmental release of GMOs. No biocontainment-level claims (the pipeline does not commission BSL-2/3 work). No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.
Read Order (for next agents)
- BOUNDARY.md — the binding boundary block.
- PRD.md — the controlling spec (orchestrator's locked v1.0 decisions).
- RESISTANCE.md — anti-corruption discipline; binding meta-protocol.
- HANDOFF-CPU-CONTINUATION.md — what the CPU-continuation phase did (items A-H).
- FINAL-REPORT-RUNPOD-AUTONOMOUS.md — what the autonomous H100 chain produced.
- RUNPOD-AUTONOMOUS-RUNBOOK.md — operator runbook for the autonomous chain.
- NEXT-WAVE-PLAN.md — open work, ordered by priority.
- docs/synbio-audit-trail-v0.1-spec.md — the published Zer0pa standard.
- MODUS-OPERANDI.md — the multi-agent role chain.
Cross-workstream principle
This workstream runs in parallel with Zer0pa/Health, Zer0pa/Materials, and Zer0pa/Energy. Each workstream is built end-to-end as an independent pipeline. No substrate is shared at runtime. Fork-and-own is required: copy the pattern, reimplement inside Synthetic Biology. The research agent's three cross-workstream substrate-sharing recommendations (Shared Infrastructure Layer, Cross-Pipeline Gym Flywheel, single SE(3) MACE service) are captured-and-overridden per operator policy.
Provenance
- Initial commit: 2026-05-01.
- CPU continuation phase (items A-H): 2026-05-01 — see commits
52b8ad2through3d8317f. - Autonomous H100 SXM chain bootstrap + 10-phase orchestrator: 2026-05-01 —
29dc4f2. - Real MACE-OFF binding ΔG + RFdiffusion2 inference modules: 2026-05-02 —
a5fc98e. - Pod 1hx4ctwg1mpmxr autonomous run: 2026-05-02.
- Defensive
_latest_checkpoint(skip zero-byte/truncated ckpts on resume): 2026-05-03 —a08ee50. - Atomic checkpoint save (tmp+rename, prevents 0-byte meta/truncated .pt at source): 2026-05-03 —
0aeafb3. - Pod 1hx4ctwg1mpmxr autonomous run COMPLETE — all 10 phases sentinel-marked: 2026-05-03 —
3b9744e.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zer0pa_synbio-0.1.0.tar.gz.
File metadata
- Download URL: zer0pa_synbio-0.1.0.tar.gz
- Upload date:
- Size: 162.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98f890c575c31d5c30ae9ca62bfe71c87c5e036692ccc9bfe8645bec7813887c
|
|
| MD5 |
51f242fdcd4ee3d547ac7955c97396de
|
|
| BLAKE2b-256 |
29bc9ad71fef118105d59a198cd4f200705a4f9d49b25ad46c18b5c1930fb9e5
|
Provenance
The following attestation bundles were made for zer0pa_synbio-0.1.0.tar.gz:
Publisher:
publish.yml on Zer0pa/Synthetic-Biology
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zer0pa_synbio-0.1.0.tar.gz -
Subject digest:
98f890c575c31d5c30ae9ca62bfe71c87c5e036692ccc9bfe8645bec7813887c - Sigstore transparency entry: 1436191970
- Sigstore integration time:
-
Permalink:
Zer0pa/Synthetic-Biology@a6dff18d5ac70018d44695074ba86827fbe631f0 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Zer0pa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a6dff18d5ac70018d44695074ba86827fbe631f0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file zer0pa_synbio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zer0pa_synbio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 184.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ca2dc66bdff157e4d464fecb37efa524e8f53fd545aa541fc4fc618580135e2
|
|
| MD5 |
290d15ac67cb53a1bcb4f254c53e94a1
|
|
| BLAKE2b-256 |
a7d9beebd4ec816e235ccb3dae98d6d2c81453df9667af28a271c8d63855452f
|
Provenance
The following attestation bundles were made for zer0pa_synbio-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Zer0pa/Synthetic-Biology
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zer0pa_synbio-0.1.0-py3-none-any.whl -
Subject digest:
9ca2dc66bdff157e4d464fecb37efa524e8f53fd545aa541fc4fc618580135e2 - Sigstore transparency entry: 1436191972
- Sigstore integration time:
-
Permalink:
Zer0pa/Synthetic-Biology@a6dff18d5ac70018d44695074ba86827fbe631f0 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Zer0pa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a6dff18d5ac70018d44695074ba86827fbe631f0 -
Trigger Event:
push
-
Statement type: