Skip to main content

Tiny protein-function classifier distilled from ProteinMPNN, for edge deployment.

Project description

capiti

Tiny protein-function classifier for edge deployment. Given a nucleotide sequence encoding a protein, capiti flags whether the encoded protein is expected to retain the enzymatic function of one of a small reference set.

Weighs ~1 MB on disk, runs inference in tens of milliseconds on a Raspberry Pi. Trained by distilling ProteinMPNN's function-preserving design prior into a small 1D CNN.

Overview (each ResidualDilatedBlock collapsed to one box):

CapitiCNN overview

Inside one ResidualDilatedBlock:

ResidualDilatedBlock detail

See docs/capiti.summary.txt for the full per-layer size / FLOP table.

Install

pip install capiti

Use

capiti ATGCGTAAAGTGGCC...           # prints TRUE or FALSE (default set ab9)
capiti ATGCGT...  --cutoff 0.8 -v   # TRUE  p_inset=0.995
capiti --fasta seqs.fa              # batch over a FASTA
echo ATGCGT... | capiti --stdin

Reference sets

capiti ships three bundled reference sets, selectable at invocation time via --set NAME (or CAPITI_SET).

set targets description
ab9 9 Beta-lactamases relevant to antibiotic resistance plus other soluble enzymes. Default.
E 59 Larger enzyme panel (54 PDB + 5 AlphaFold-only entries).
C 235 Broad enzyme panel sourced from PDB.
capiti ATGCGT... --set ab9
capiti --fasta seqs.fa --set C
CAPITI_SET=E capiti --stdin

Inference-time gate

Capiti pairs the CNN with a SIFTS-backed fixed-position gate by default: if the model picks a target Ti and the query has a mutated residue at any of Ti's catalytic / active-site positions, the in-set score is forced to 0. This catches single-residue active-site knockouts the masked-mean CNN under-weights. Disable with --no-gate.

Exit code is 0 on TRUE, 1 on FALSE, suitable for shell pipelines:

capiti ATGCGT... && echo "in set" || echo "not in set"

Benchmarks

On the held-out test split for each set (gate on, natural threshold):

set targets AUC mpnn_pos ala_scan
ab9 9 1.000 0.999 1.000
E 59 0.984 0.983 0.966
C 235 0.970 0.964 0.953

Side-by-side comparison with BLAST and k-mer baselines at docs/benchmark/CE_summary.md. Per-set ROC, PR, per-class plots at docs/benchmark/v3/, docs/benchmark/E_v1/, docs/benchmark/C_v1/.

Status

Research-grade. The CLI surface (flags, stdin/FASTA behaviour, exit codes) is stable; bundled models may be retrained and updated between 0.x releases. Not for operational use.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capiti-0.1.2.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capiti-0.1.2-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file capiti-0.1.2.tar.gz.

File metadata

  • Download URL: capiti-0.1.2.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.2.tar.gz
Algorithm Hash digest
SHA256 76706b9270a7783bd29efc3c6510f7cf6a4e2e4b7b5e690066d776e3058fc10a
MD5 6c5fdd082d6728cd88d1cf5aa7fd8bdc
BLAKE2b-256 8a470ac287831e15b1c9e67e07188de2e1bf682f9e23ced8f14234f6de61be02

See more details on using hashes here.

File details

Details for the file capiti-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: capiti-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2b2e823983beaf4d799b953d908cf1bf2e1320060585f966f581aa9349cafcdd
MD5 c5a439b4031036ae482989e78b1e9436
BLAKE2b-256 3fb0f1836b0a72995137c0e5ce252cde5c2dc70a0f23e2291b4507505372b918

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page