Skip to main content

Tiny protein-function classifier distilled from ProteinMPNN, for edge deployment.

Project description

capiti

Tiny protein-function classifier for edge deployment. Given a nucleotide sequence encoding a protein, capiti flags whether the encoded protein is expected to retain the enzymatic function of one of a small reference set.

Weighs ~1 MB on disk, runs inference in tens of milliseconds on a Raspberry Pi. Trained by distilling ProteinMPNN's function-preserving design prior into a small 1D CNN.

Overview (each ResidualDilatedBlock collapsed to one box):

CapitiCNN overview

Inside one ResidualDilatedBlock:

ResidualDilatedBlock detail

See docs/capiti.summary.txt for the full per-layer size / FLOP table.

Install

pip install capiti

Use

capiti ATGCGTAAAGTGGCC...           # prints TRUE or FALSE (default set ab9)
capiti ATGCGT...  --cutoff 0.8 -v   # TRUE  p_inset=0.995
capiti --fasta seqs.fa              # batch over a FASTA
echo ATGCGT... | capiti --stdin

Reference sets

capiti ships three bundled reference sets, selectable at invocation time via --set NAME (or CAPITI_SET).

set targets description
ab9 9 Beta-lactamases relevant to antibiotic resistance plus other soluble enzymes. Default.
E 59 Larger enzyme panel (54 PDB + 5 AlphaFold-only entries).
C 235 Broad enzyme panel sourced from PDB.
capiti ATGCGT... --set ab9
capiti --fasta seqs.fa --set C
CAPITI_SET=E capiti --stdin

Inference-time gate

Capiti pairs the CNN with a SIFTS-backed fixed-position gate by default: if the model picks a target Ti and the query has a mutated residue at any of Ti's catalytic / active-site positions, the in-set score is forced to 0. This catches single-residue active-site knockouts the masked-mean CNN under-weights. Disable with --no-gate.

Exit code is 0 on TRUE, 1 on FALSE, suitable for shell pipelines:

capiti ATGCGT... && echo "in set" || echo "not in set"

Benchmarks

On the held-out test split for each set (gate on, natural threshold):

set targets AUC mpnn_pos ala_scan
ab9 9 1.000 0.999 1.000
E 59 0.984 0.983 0.966
C 235 0.970 0.964 0.953

Side-by-side comparison with BLAST and k-mer baselines at docs/benchmark/CE_summary.md. Per-set ROC, PR, per-class plots at docs/benchmark/v3/, docs/benchmark/E_v1/, docs/benchmark/C_v1/.

Status

Research-grade. The CLI surface (flags, stdin/FASTA behaviour, exit codes) is stable; bundled models may be retrained and updated between 0.x releases. Not for operational use.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capiti-0.1.1.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capiti-0.1.1-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file capiti-0.1.1.tar.gz.

File metadata

  • Download URL: capiti-0.1.1.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.1.tar.gz
Algorithm Hash digest
SHA256 85f83c9acf30dfcd85f31dc43899b4081fd47ea313004993efd7db29ce6a788f
MD5 666f32304c3825505759d5de34619f39
BLAKE2b-256 2464c692bda78c81cc1f8605abc5b1bb4417e4e96718f18e51621ea557702872

See more details on using hashes here.

File details

Details for the file capiti-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: capiti-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 469f41b05a47b6d26ef06a4a1f162d0d81bc5917d482e15f6975f15266f8c997
MD5 3c7253fec9517a39761165ba2b96e3bd
BLAKE2b-256 bec367f39af96defc092e0c19806139c6b79d16049017cf8297d1308a34fe36f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page