Skip to main content

Zer0pa Synthetic Biology / Metabolic Pathway Engineering Pipeline (Pipeline 4 of 6). Research infrastructure: predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims.

Project description

Synthetic-Biology

Live window into the Zer0pa lab. Synthetic Biology / Metabolic Pathway Engineering — Pipeline 4 of 6.

What This Is

In silico metabolic-pathway-engineering pipeline (L1→L7 + L4.5 + L5_OED) producing predicted pathways, KPIs, and SBOL3-attested genetic-modification specs as research artifacts.

The pipeline trains a Zer0pa-owned Conditional Enzyme Kinetics Model (CEKM) on real BRENDA / EnzyExtract / GotEnzymes2 / ProteinGym corpora, runs the four-tool L4 kinetics ensemble (DLKcat / CatPred / TurNuP / CEKM) plus eQuilibrator MDF / COBRApy FBA / FluxGAT essentiality, scaffolds the L4.5 unknown-enzyme path with RFdiffusion2

  • MACE-OFF + ESMFold + ProDy + Genie-CAT, and emits SBOL3-attested genetic-modification specifications via L6 host engineering. Every adapter emits a UniversalLayerEnvelope whose 23-falsifier registry, boundary-block sha256, and license-class enforcement are first-class audit invariants.

The Human Milk Oligosaccharide (HMO) seed triple — 2'-fucosyllactose, 3'-sialyllactose, and disialyllacto-N-tetraose in E. coli iML1515 — is the validation triple. Pre-registered acceptance thresholds (validation/hmo-seed-evidence/<seed>/acceptance.yaml) are the binding numerical gates; structural envelope-chain conformance passes 3/3 today.

Pipeline Mechanics

Field Value
Architecture METABOLIC_PATHWAY_PIPELINE (L1 ZPE → L2 LIRC → L3 retrosynthesis → L3.5 ranking → L4 deep-eval → L4.5 unknown-enzyme → L5 MFMO → L5_OED → L6 host engineering → L6_BUILD cell-free TX-TL → L7 dossier)
Substrate UniversalLayerEnvelope (Pydantic v2 + canonical-JSON sha256), SBOL3-attested L6 spec, PROV-O JSON-LD chain, DuckDB audit + GraphML/Cypher/RDF KG export
Execution Mac CPU + H100 SXM 80 GB (autonomous orchestrator on Runpod, 10-phase chain with resume sentinels)
Toolchain torch 2.2/cu130 + transformers (ESM-2 650M / ESMFold) + RFdiffusion2 (BSD-3) + MACE-OFF (medium) + equilibrator-pathway (MDF LP) + COBRApy + ripser/persim (TDA) + BoTorch (Hamming kernel + qLogNEHVI) + selfies + RDKit
Discipline 23 falsifiers across 3 tiers (Tier-A fast / Tier-B medium / Tier-C heavy) + cross-model disagreement first-class + GPL-subprocess isolation (Salis RBS Calculator v1.0) + RESISTANCE.md anti-corruption protocol
Compute Status v0.1 H100 chain end-to-end complete on Pod 1hx4ctwg1mpmxr 2026-05-03 (CEKM 20,000 fp32 steps; loss 6.93 → ~3.0, best 2.73 at step-19850; HMO triple + L4.5 inference + 19.2 GB CEKM push to HF emitted in same chain)

Key Metrics

Metric Value Baseline
CEKM_REAL_CORPUS_LOSS 6.93 → ~3.0 (steps 0 → 20000; best 2.73 at step-19850) total: 33,851 in-corpus rows + 5,961 held-out + 101,553 adversarial Tier α/β/γ negatives
AUTONOMOUS_CHAIN_PHASES 10 / 10 complete preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize (Pod 1hx4ctwg1mpmxr 2026-05-03 → 3b9744e)
HMO_TRIPLE_AUDIT_VERIFY 3/3 PASS conformance verifier per docs/synbio-audit-trail-v0.1-spec.md §10; DSLNT round-0 dossier envelope_count=11 from 2026-05-03 chain
CPU_PIPELINE_TESTS 256 passing, 59 GPU-skipped 0 regressions across CPU continuation A-H

Source: PRD.md, FINAL-REPORT.md, FINAL-REPORT-RUNPOD.md, FINAL-REPORT-RUNPOD-AUTONOMOUS.md, validation/hmo-seed-evidence/, audit/runtime/runpod/.

Repo Identity

Field Value
Identifier Synthetic-Biology
Repository https://github.com/Zer0pa/Synthetic-Biology
Portfolio Bio-Engineering
Visibility INTERNAL
Default Branch main
Authority Source PRD.md (locked v1.0 decisions)
License repository license file

Readiness

Field Value
Evidence posture v0.1 first full-budget H100 chain end-to-end complete; not a productized service
Checks 256 passing tests + 23 falsifiers + 3/3 HMO seed audit-verify PASS
Custody boundary 3 CEKM ckpts (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL) on HF Architect-Prime/synbio-cekm-v0.1; envelope chains + dossiers + L4.5 ESMFold PDBs + MACE-OFF binding ΔG JSONs in git
Confidence scoped by Tier-A/B/C falsifier hierarchy; PathGym DBTL-holdout calibration deferred; CEKM calibration gate non-blocking by design (no BRENDA holdout in v0.1 corpus)
Authority PRD.md (locked decisions); FINAL-REPORT-RUNPOD-AUTONOMOUS.md (chain receipts at 3b9744e); HANDOFF-CPU-CONTINUATION.md (CPU phase A-H record)

Honest Blocker

CEKM v0.1 reached its 20,000-step target with checkpoints at step 1500 / 18000 / 19000 pushed to HF; this is a v0.1 research checkpoint, not a calibrated affinity predictor. Wet-lab Phase 2 dispatch is triple-gated and never on the cutover path. PathGym DBTL holdout calibration of TDA warning_score thresholds and L5 surrogate calibration scores is deferred to held-out post-experiment data. Real RFdiffusion2 motif-conditional designs require curated TS-mimetic geometry, downstream of v0.1; the v0.1 RFD2 wrapper additionally errored on run_inference.py not found (upstream layout drift across the 3 candidate paths the wrapper probes — non-blocking since ESMFold + MACE-OFF outputs landed for all 3 HMO seeds). BRENDA bulk download requires registration; v0.1 trains on EnzyExtract dark-matter + GotEnzymes2 + ProteinGym subsets, not full BRENDA core. CEKM Phase 40 calibration gate is non-blocking by design (sentinel-touched after eval ran cleanly against step-19000 ckpt; tier α/β/γ AUCs return None because no BRENDA holdout exists in this corpus).

What We Prove

  • Real CEKM training on real corpus runs the full v0.1 budget end-to-end on H100 SXM (EnzyExtract 60K + GotEnzymes2 17K → 33K in-corpus + 6K held-out + 100K adversarial Tier α/β/γ negatives; loss curve 6.93 → ~3.0 over 20,000 fp32 steps, best 2.73 at step-19850; sustained 1.39 steps/s post-recovery; atomic-save + defensive _latest_checkpoint patches survived ~6 mfs-quota-induced partial-write events without losing checkpoint integrity).
  • Autonomous H100 chain runs all 10 phases (preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize) end-to-end on Pod 1hx4ctwg1mpmxr 2026-05-03; phases 50–90 took 6m 32s wallclock after Phase 30's 3h training; emits real ESMFold PDBs for 7 enzymes across 3 HMO seeds + MACE-OFF binding ΔG JSONs + DSLNT round-0 dossier (envelope_count=11) + 19.2 GB CEKM checkpoint push to Hugging Face in 48s.
  • HMO scientific-validation triple emits structurally complete L1→L7 envelope chains for 2'-fucosyllactose / 3'-sialyllactose / disialyllacto-N-tetraose; synbio audit verify passes 3/3 under the conformance verifier (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present on every L6 envelope, Class C/D/E license-grants enforced, cross-model disagreement records emitted, falsifier registry loaded).
  • L4B real eQuilibrator MDF on HMO precursor pathway: 2'-FL MDF=+6.78, 3'-SL +11.84, DSLNT +11.41 kJ/mol via equilibrator_pathway.ThermodynamicModel.mdf_analysis() with per-compound optimal concentrations in the 1 μM – 10 mM physiological window.
  • L5 real BoTorch surrogate: GP per objective with custom Hamming-distance kernel + qLogNoisyExpectedHypervolumeImprovement + ASR-thermostable warm-starts (split-venv subprocess pattern; weights stay float32, autocast handles per-op casting; plug-replaceability invariant preserved across real-vs-stub paths).
  • TDA real fermentation simulator: 5-state Monod ODE via scipy.integrate.solve_ivp(LSODA) covering all five PRD §5.3 failure modes (oxygen-transfer collapse / byproduct buildup / growth stall / toxicity threshold / nutrient depletion) with multi-channel ripser bottleneck + late-vs-early rate-of-change hybrid early-warning.
  • Synbio Audit-Trail Specification v0.1 (CC BY 4.0, Zer0pa-published): SBOL3 + PROV-O extension + canonical-JSON sha256 hash chain + Class A/B/C/D/E license-class enforcement + GPL-subprocess-isolation pattern (Salis RBS Calculator v1.0 binary wrapper, no Python import of GPL modules).

What We Don't Claim

  • This is not a clinical or human-subject pipeline. No diagnostic, therapeutic, or device claims.
  • This is not a deployed industrial production system. No commercial titer guarantees.
  • The CEKM v0.1 checkpoint is not a calibrated affinity predictor; it is a v0.1 research checkpoint trained for the full 20,000-step budget with bounded loss-decline evidence on a held-out partition. Tier α/β/γ AUCs are None because v0.1 has no BRENDA holdout.
  • HMO predictions are advisory research artifacts, not regulatory submissions or product specifications. Wet-lab validation is operator-gated and never on the cutover path.
  • The L4.5 unknown-enzyme path emits Tier-1 / Tier-2 / Tier-3 advisories per PRD §6.6; these are research suggestions, not enzyme designs warranting downstream synthesis without independent verification.
  • No environmental release of GMOs. No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.

Verification Status

Surface Status Evidence
Test suite 256 passing, 59 GPU-skipped pytest tests/ clean on Python 3.13 / macOS x86_64; CPU continuation A-H 0 regressions
Falsifier registry 23 falsifiers across Tiers A/B/C, registry loads at module import audit/falsifiers.yaml + src/zer0pa_synbio/falsifiers/checks.py (one CPU implementation per registry entry; deliberate-trigger test per falsifier)
HMO triple conformance 3/3 PASS under synbio audit verify validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md + envelope chains 21/24/24 envelopes per seed
CEKM checkpoint custody 3 ckpts on HF (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL + meta sha256-recorded) https://huggingface.co/Architect-Prime/synbio-cekm-v0.1 (push 2026-05-03T03:46Z, 48s upload @ 3.43 GB/s)
Cutover invariance 38 plug-replaceability / cutover-invariance tests httpx.MockTransport golden-fixture suite forked from sibling-workstream Energy Wave 4
Boundary discipline Boundary block sha256-checked on every envelope; falsifier f000_boundary_violation enforces src/zer0pa_synbio/boundary.py + BOUNDARY.md

Proof Anchors

  • PRD.md — locked v1.0 spec; controlling decisions, layer contracts, falsifier registry, license discipline.
  • audit/falsifiers.yaml — 23-falsifier registry with id, tier, severity, gate_action.
  • validation/hmo-seed-evidence/ — pre-registered acceptance thresholds + envelope chains + dossiers + audit-verify reports for the 2'-FL / 3'-SL / DSLNT validation triple.
  • docs/synbio-audit-trail-v0.1-spec.md — Zer0pa-published Synbio Audit-Trail Spec v0.1 (CC BY 4.0): SBOL3 + PROV-O + sha256 hash chain + license-class enforcement + GPL subprocess isolation.
  • src/zer0pa_synbio/cekm/train.py — CEKM training entrypoint (real corpus path, adversarial-negatives sampler, atomic-save checkpoint, defensive resume that skips zero-byte/truncated meta).
  • FINAL-REPORT-RUNPOD-AUTONOMOUS.md — chain receipts at commit 3b9744e: per-phase START/RETRY/DONE events, all 10 sentinels, HF push verification, L4.5 inference outputs.

Repo Shape

  • src/zer0pa_synbio/ — adapters L1-L7, envelope, falsifiers, CEKM model + train + loaders, KG writer, audit writer, TDA simulator, runpod_inference, CLI
  • audit/ — falsifiers.yaml, source_manifests/, license_grants/, runtime/ (gitignored except runpod state surface)
  • validation/hmo-seed-evidence/ — 2'-FL / 3'-SL / DSLNT triple with acceptance.yaml + dossier.json + envelope_chain.json + RESULT.md per seed
  • kg/ — schema.cypher + nodes.csv + edges.csv (Neo4j-shaped + GraphML/Cypher/RDF/Turtle export)
  • tests/ — 256 passing tests across contract / integration / falsification waves / cutover invariance
  • docs/ — Synbio Audit-Trail Spec v0.1 (CC BY 4.0)
  • scripts/runpod/ — autonomous H100 SXM chain (bootstrap, orchestrator, heartbeat, watchdog, 10 phase scripts) + Mac-side wake-up watcher + corpus stager
  • configs/ — wave4 real-corpus CEKM training + runpod orchestrator phase config
  • fixtures/ — LIRC slice + CEKM mini-fixtures + per-source manifests

Boundary

Research infrastructure for in silico synthetic biology / metabolic pathway engineering. Outputs are research artifacts — predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims. No clinical or human-subject use. No environmental release of GMOs. No biocontainment-level claims (the pipeline does not commission BSL-2/3 work). No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.

Read Order (for next agents)

  1. BOUNDARY.md — the binding boundary block.
  2. PRD.md — the controlling spec (orchestrator's locked v1.0 decisions).
  3. RESISTANCE.md — anti-corruption discipline; binding meta-protocol.
  4. HANDOFF-CPU-CONTINUATION.md — what the CPU-continuation phase did (items A-H).
  5. FINAL-REPORT-RUNPOD-AUTONOMOUS.md — what the autonomous H100 chain produced.
  6. RUNPOD-AUTONOMOUS-RUNBOOK.md — operator runbook for the autonomous chain.
  7. NEXT-WAVE-PLAN.md — open work, ordered by priority.
  8. docs/synbio-audit-trail-v0.1-spec.md — the published Zer0pa standard.
  9. MODUS-OPERANDI.md — the multi-agent role chain.

Cross-workstream principle

This workstream runs in parallel with Zer0pa/Health, Zer0pa/Materials, and Zer0pa/Energy. Each workstream is built end-to-end as an independent pipeline. No substrate is shared at runtime. Fork-and-own is required: copy the pattern, reimplement inside Synthetic Biology. The research agent's three cross-workstream substrate-sharing recommendations (Shared Infrastructure Layer, Cross-Pipeline Gym Flywheel, single SE(3) MACE service) are captured-and-overridden per operator policy.

Provenance

  • Initial commit: 2026-05-01.
  • CPU continuation phase (items A-H): 2026-05-01 — see commits 52b8ad2 through 3d8317f.
  • Autonomous H100 SXM chain bootstrap + 10-phase orchestrator: 2026-05-01 — 29dc4f2.
  • Real MACE-OFF binding ΔG + RFdiffusion2 inference modules: 2026-05-02 — a5fc98e.
  • Pod 1hx4ctwg1mpmxr autonomous run: 2026-05-02.
  • Defensive _latest_checkpoint (skip zero-byte/truncated ckpts on resume): 2026-05-03 — a08ee50.
  • Atomic checkpoint save (tmp+rename, prevents 0-byte meta/truncated .pt at source): 2026-05-03 — 0aeafb3.
  • Pod 1hx4ctwg1mpmxr autonomous run COMPLETE — all 10 phases sentinel-marked: 2026-05-03 — 3b9744e.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zer0pa_synbio-0.1.0.tar.gz (162.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zer0pa_synbio-0.1.0-py3-none-any.whl (184.1 kB view details)

Uploaded Python 3

File details

Details for the file zer0pa_synbio-0.1.0.tar.gz.

File metadata

  • Download URL: zer0pa_synbio-0.1.0.tar.gz
  • Upload date:
  • Size: 162.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zer0pa_synbio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 98f890c575c31d5c30ae9ca62bfe71c87c5e036692ccc9bfe8645bec7813887c
MD5 51f242fdcd4ee3d547ac7955c97396de
BLAKE2b-256 29bc9ad71fef118105d59a198cd4f200705a4f9d49b25ad46c18b5c1930fb9e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for zer0pa_synbio-0.1.0.tar.gz:

Publisher: publish.yml on Zer0pa/Synthetic-Biology

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zer0pa_synbio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zer0pa_synbio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 184.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zer0pa_synbio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ca2dc66bdff157e4d464fecb37efa524e8f53fd545aa541fc4fc618580135e2
MD5 290d15ac67cb53a1bcb4f254c53e94a1
BLAKE2b-256 a7d9beebd4ec816e235ccb3dae98d6d2c81453df9667af28a271c8d63855452f

See more details on using hashes here.

Provenance

The following attestation bundles were made for zer0pa_synbio-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Zer0pa/Synthetic-Biology

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page