A legible reliability / QC / qualification + DBTL layer over commodity bio-AI tools.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Curtisflo

These details have not been verified by PyPI

Project description

karyon

A legible reliability / QC / qualification layer over commodity bio-AI tools.

Modern bio-AI toolkits (structure prediction, docking, generative chemistry, genomics) are getting powerful and cheap — NVIDIA's BioNeMo Agent Toolkit, for example, packages a decade of them as ready-to-call agent skills. What they don't ship is a deterministic, independent gate that answers the question that comes right after a model returns an answer:

Is this output trustworthy? Is this docking pose physically valid? Is this benchmark number inflated by leakage? Is this "no-effect" screen result just under-powered? Is this generated sequence even synthesizable?

karyon is that gate. It is not a model. Every check is a legible, deterministic contract, and every rejection names its reason — the "unroutable net" report, ported from EDA/CAD design-rule checking to biology. It ships as a pip-installable Python library and as agent skills that compose alongside the generative tools (install a karyon skill next to a BioNeMo skill; the model proposes, karyon qualifies).

What the checks show

karyon's checks run on public benchmarks. None of these problems are discovered here — each is a known reliability failure mode. karyon's contribution is to express each as a legible, named-reason contract, cross-validate it against the reference tool where one exists, and make it agent-callable — plus one check the incumbents skip. The headline numbers, with lineage:

70% of DiffDock's RMSD≤2 "successes" are physically invalid — reproduces PoseBusters (Buttenschoen et al., Chem. Sci. 2024): deep-learning docking scores well on RMSD yet emits physically invalid poses (77% of DiffDock poses fail an inter-molecular check vs just 1% for classical Vina docking). karyon re-derives it as a deterministic geometric DRC (bond/angle/ring/clash/strain, zero fitted parameters) and agrees with the real PoseBusters package on 87% of poses (≥85% pre-registered).
Retrosynthesis "accuracy" is largely template memorization — a known leakage concern in retrosynthesis benchmarking, quantified here on USPTO-50k: a faithful retrosim baseline scores top-1 37.9% on the standard split but 16.1% on a leakage-free partition (93.8% of the test set carries a near-duplicate or shared training template) — a measured +21.8-point inflation.
ADMET benchmark numbers inflate under random splits — the reason MoleculeNet (Wu et al., Chem. Sci. 2018) prescribes scaffold splits; karyon measures the gap directly: random-vs-scaffold lifts AUROC by +0.105 (classification) and ρ by +0.100 (regression).
PPI benchmarks leak protein identity — sequence-based protein–protein interaction benchmarks report on random pair splits, where the same proteins straddle train and test (Park & Marcotte 2012). On Guo-yeast, a transparent node-degree-memorization baseline scores AUROC 0.77 on the reported split but 0.50 — exactly chance on neither-seen pairs: a +0.27 node-identity inflation, ~85% of the test set leaking (core install, no rdkit).
CRISPR screens hide under-powered non-hits (the new check) — incumbents (MAGeCK and kin) emit a gene-level hit/non-hit q-value and throw away the within-gene guide structure. karyon reads that structure back from counts alone, control-calibrated, and flags ~53% of gold-standard silent failures at a 3% false-flag rate — shown non-redundant with the FDR, not just a softer q-value. Full method + pre-registered evaluation: docs/screen-power.md.
Single-cell screens hide failed-knockdown nulls (in-domain — the sharpest cut) — a Perturb-seq screen calls each perturbation hit / no-phenotype, but a "no-phenotype" can simply mean the guide never knocked the target down. Perturb-seq measures that knockdown, so the silent-failure label is real: on Replogle's K562-essential screen karyon flags 34% of no-phenotype essential-gene calls as untrustworthy, non-redundant with the deposited significance (|ρ| = 0.003, vs ~0.29 for the bulk check) — and the same gate runs on your own screen, not just the reference. See docs/screen-power.md.

Reproduce these numbers

Every figure above is printed by a python -m karyon.<module> entrypoint that fetches a public benchmark and runs the audit — nothing is hand-entered, the printed value is the source of truth. Reproduce them all:

pip install "karyon[chem]"          # screen-qc needs only the core install
python examples/reproduce/run.py    # claim ↔ command ↔ reproduced value   (or: --list)

Per-claim commands, datasets, runtimes, and the offline (KARYON_NO_NETWORK=1) path are documented in examples/reproduce/.

Install

pip install karyon                 # core (numpy, scikit-learn)
pip install "karyon[chem]"         # + rdkit, rdchiral  (pose validity, leakage audits)
pip install "karyon[seqdesign]"    # + dnachisel, ostir (sequence/expression predictors)
pip install "karyon[data]"         # + xlrd             (one Excel-backed dataset loader)

Installing karyon puts the karyon CLI on your PATH (karyon qualify …, karyon audit …, karyon list).

Datasets are fetched on demand from public sources and cached under ~/.cache/karyon (override with $KARYON_CACHE). See DATASETS.md.

Quickstart

One surface — karyon.qualify(artifact, modality) — gates every modality and returns one stable result:

from karyon import qualify

# Is this docking pose physically valid?
r = qualify("pose_1.sdf", modality="pose")          # .sdf infers "pose" (modality optional here)
print(r.ok, r.items[0][1].messages)

# Is this generated DNA sequence synthesizable?
r = qualify("GACCTTTTGCA...", modality="dna")
print("synthesizable" if r.ok else r.items[0][1].messages)

Same thing on the command line — exit 0 = PASS, 1 = FAIL, so it gates a pipeline directly:

karyon qualify pose_1.sdf --modality pose --json
karyon qualify diffdock_out/ --modality pose          # a whole directory of poses
karyon audit screen --json                            # a dataset-level audit (bulk CRISPR screen power)
karyon audit screen --single-cell --input my.csv      # qualify your own Perturb-seq no-phenotype calls

Every verdict is JSON-serializable with named reasons (the stable schema — see docs/qualify.md):

{"modality": "pose", "ok": false,
 "items": [{"name": "pose_1.sdf", "ok": false, "score": 1.5,
            "reasons": [{"contract": "INTERNAL_STERIC_CLASH", "message": "…", "weight": 1.5}]}],
 "batch": null}

Agent skills

v0.5 ships skills spanning the major modalities a generative toolkit touches — docking and structure prediction (poses, co-folding, complex interfaces), antibody/binder developability, generative chemistry and DNA, functional-genomics screens (bulk and single-cell Perturb-seq), benchmark-leakage audits (retro / ADMET / PPI), and sequence/regulatory design. It's a cross-section that proves the contract pattern generalizes, not exhaustive coverage; the library underneath carries more checks than the marquee skills, and the roadmap wraps more of them over time.

Each skill is a SKILL.md (YAML frontmatter + instructions) installable into Claude Code, Codex, and other harnesses via the skills CLI:

npx skills add Curtisflo/karyon --skill pose-validity --agent claude-code

Skill	What it qualifies	Composes with (BioNeMo)
`pose-validity`	physical validity of docking poses (single-molecule / intramolecular)	`diffdock-nim`, `boltz2-nim`, `openfold3-nim`
`cofold-qc`	physical validity of co-folding poses (protein↔ligand, intermolecular)	`boltz2-nim`, `diffdock-nim`, `openfold3-nim`
`complex-qc`	interface validity of protein complexes / designed binders	`rfdiffusion`, `proteinmpnn`, AlphaFold-Multimer
`antibody-qc`	developability / sequence liabilities of designed antibody Fv (VH/VL, VHH)	`rfdiffusion`, `proteinmpnn`, AlphaFold-Multimer
`mol-qc`	validity / synthesizability of generated molecules	`genmol-nim`, `molmim`
`gen-dna-qc`	synthesizability / manufacturability of generated DNA	`evo2-nim`
`benchmark-leakage`	train/test leakage in a model's benchmark	`kermt`, retrosynthesis models
`screen-qc`	under-powered non-hits in a (bulk) CRISPR screen	`parabricks` (downstream)
`single-cell-screen-qc`	failed-knockdown "no-phenotype" calls in a Perturb-seq screen	`parabricks` (downstream)
`promoter-design`	σ70 promoter architecture (−35/−10 boxes, spacer, GC), reference-calibrated	`evo2-nim`

Agent self-repair loop

Because every rejection names its reason, a named reason is a repair instruction — an agent can read it and make the corresponding edit, then re-check. A black-box pass/fail can't drive that loop; a legible one can. karyon.repair closes it: generate → qualify → fix-from-reasons → re-qualify → converge.

python examples/agent_loop/repair_dna.py     # watch the loop converge (pure stdlib, no API)
karyon repair my_draft.fasta -m dna --json   # repair your own draft via the CLI

repair loop · dna · CONVERGED in 3 edit(s)
  round 0: FAIL  [GC_OUT_OF_BAND, HOMOPOLYMER_RUN, RESTRICTION_SITE]  ↳ broke a 14-base homopolymer run at 46
  round 1: FAIL  [GC_OUT_OF_BAND, RESTRICTION_SITE]                   ↳ rebalanced GC 22%→32% into the band
  round 2: PASS  [RESTRICTION_SITE]                                  ↳ removed the EcoRI site at 80
  round 3: PASS  [clean]

The bundled DnaRepairAgent / MolRepairAgent / AntibodyRepairAgent make the loop runnable and CI-tested with no LLM (the antibody agent applies the textbook conservative liability fixes — Cys→Ser, Asn→Gln, Asp→Glu, break the sequon). In real use the agent is your harness — e.g. Claude Code in your terminal, no API key: it writes a candidate, runs karyon qualify, reads the named reasons, edits, re-runs until PASS. That's the whole thesis — legible QC is what makes agentic self-repair possible. See examples/agent_loop/ and docs/repair.md.

Does qualifying compound? (a DBTL-loop demonstration)

The self-repair loop above gates one artifact. The legible operator (dbtl_operator) gates a whole design-build-test-learn loop — it qualifies each measured readout before folding it into its surrogate, so a corrupt or under-powered measurement is excluded from the model update, with a reason. On a headroom substrate with a model-degrading (synthetic) assay, that protection compounds over recursive cycles: the gated arm's held-out-ρ advantage widens cycle-over-cycle — a quality edge (keep bad labels out) runs away where merely spending less budget would saturate — but only once the tool is unreliable enough. Below a reliability crossover the gate is net-costly (it drops good data too); above it, it pays and compounds. Full method, the crossover table, and the negative controls: docs/compounding.md.

python -m karyon.operator_compound_honesty --seeds 8   # the pre-registered test + reliability-crossover sweep

Library layout

src/karyon/
  spine.py            the qualify spine — qualify(artifact, modality) -> QualifyResult over every gate
  repair.py           the agent self-repair loop — generate -> qualify -> fix-from-reasons -> converge
  cli.py              the `karyon` command-line entry point (qualify / repair / audit / list)
  contracts.py        the legible verdict engine (named contracts -> Verdict with reasons)
  pose_validity.py    cofold_validity.py  protein_interface_validity.py   structural-validity DRCs (pose / co-fold / complex interface)
  mol_qc.py           gen_dna_validity.py   antibody_developability.py    generated-output DRCs (molecule / DNA / antibody Fv)
  retro_honesty.py    molnet_honesty.py   benchmark leakage audits
  screen_qc.py        perturbseq_qc.py    CRISPR screen QC — bulk dropout + single-cell Perturb-seq (crispr_qc.py: guide QC)
  loop.py             dbtl_operator.py    a legible design-build-test-learn loop + operator
  operator_compound.py  noisy_assay.py    does readout-qualification compound over recursive cycles?
  *_data.py           on-demand loaders for public benchmark datasets
skills/               the SKILL.md agent skills
tests/                the test suite

What this is not

karyon does not predict structures, dock ligands, or generate molecules — it qualifies the output of tools that do. Its value is legibility and trust, not accuracy. Pair it with a generative toolkit (e.g. BioNeMo) for the soft, quantitative axis; use karyon for the deterministic, auditable one.

License

Dual-licensed: code under Apache-2.0, skills/docs under CC-BY-4.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Curtisflo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.5.0

Jun 30, 2026

0.4.0

Jun 29, 2026

0.3.0

Jun 29, 2026

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

karyon-0.5.0.tar.gz (360.5 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

karyon-0.5.0-py3-none-any.whl (332.0 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file karyon-0.5.0.tar.gz.

File metadata

Download URL: karyon-0.5.0.tar.gz
Upload date: Jun 30, 2026
Size: 360.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for karyon-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`ebd03a5c8b29e46036fbafb2dba02bd09b374a2821158d7f7c37eedfe83a9b80`
MD5	`de348992cc8d032ada7e1a535c65718f`
BLAKE2b-256	`54ca128ae8b2bd81197f60e84e9511c54b036468103c9a9e51f01edf68599505`

See more details on using hashes here.

Provenance

The following attestation bundles were made for karyon-0.5.0.tar.gz:

Publisher: release.yml on Curtisflo/karyon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: karyon-0.5.0.tar.gz
- Subject digest: ebd03a5c8b29e46036fbafb2dba02bd09b374a2821158d7f7c37eedfe83a9b80
- Sigstore transparency entry: 2016416886
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: Curtisflo/karyon@5a10930c25d70c1f1c49268a58322c05051ab2d5
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/Curtisflo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5a10930c25d70c1f1c49268a58322c05051ab2d5
- Trigger Event: release

File details

Details for the file karyon-0.5.0-py3-none-any.whl.

File metadata

Download URL: karyon-0.5.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 332.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for karyon-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ea499b60e490788b7998c64b9b7157969229f0ad600ab72ce983de4433e471d0`
MD5	`e08eea0ac39b0eb37c6ffd0acd48bd8b`
BLAKE2b-256	`f3b15755d18006d0ad7f3b255bd306337c595470cf1ec685eab343aa6df5b0ef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for karyon-0.5.0-py3-none-any.whl:

Publisher: release.yml on Curtisflo/karyon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: karyon-0.5.0-py3-none-any.whl
- Subject digest: ea499b60e490788b7998c64b9b7157969229f0ad600ab72ce983de4433e471d0
- Sigstore transparency entry: 2016416981
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: Curtisflo/karyon@5a10930c25d70c1f1c49268a58322c05051ab2d5
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/Curtisflo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5a10930c25d70c1f1c49268a58322c05051ab2d5
- Trigger Event: release

karyon 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

karyon

What the checks show

Reproduce these numbers

Install

Quickstart

Agent skills

Agent self-repair loop

Does qualifying compound? (a DBTL-loop demonstration)

Library layout

What this is not

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance