Skip to main content

Tiny protein-function classifier distilled from ProteinMPNN, for edge deployment.

Project description

capiti

Tiny protein-function classifier for edge deployment. Given a nucleotide sequence encoding a protein, capiti flags whether the encoded protein is expected to retain the enzymatic function of one of a small reference set.

Weighs ~1 MB on disk, runs inference in tens of milliseconds on a Raspberry Pi. Trained by distilling ProteinMPNN's function-preserving design prior into a small 1D CNN.

Overview (each ResidualDilatedBlock collapsed to one box):

CapitiCNN overview

Inside one ResidualDilatedBlock:

ResidualDilatedBlock detail

See docs/capiti.summary.txt for the full per-layer size / FLOP table.

Install

pip install capiti

Use

capiti ATGCGTAAAGTGGCC...           # prints TRUE or FALSE (default set ab9)
capiti ATGCGT...  --cutoff 0.8 -v   # TRUE  p_inset=0.995
capiti --fasta seqs.fa              # batch over a FASTA
echo ATGCGT... | capiti --stdin

Reference sets

capiti ships three bundled reference sets, selectable at invocation time via --set NAME (or CAPITI_SET).

set targets description
ab9 9 Beta-lactamases relevant to antibiotic resistance plus other soluble enzymes. Default.
E 59 Larger enzyme panel (54 PDB + 5 AlphaFold-only entries).
C 235 Broad enzyme panel sourced from PDB.
capiti ATGCGT... --set ab9
capiti --fasta seqs.fa --set C
CAPITI_SET=E capiti --stdin

Inference-time gate

Capiti pairs the CNN with a SIFTS-backed fixed-position gate by default: if the model picks a target Ti and the query has a mutated residue at any of Ti's catalytic / active-site positions, the in-set score is forced to 0. This catches single-residue active-site knockouts the masked-mean CNN under-weights. Disable with --no-gate.

Exit code is 0 on TRUE, 1 on FALSE, suitable for shell pipelines:

capiti ATGCGT... && echo "in set" || echo "not in set"

Benchmarks

On the held-out test split for each set (gate on, natural threshold):

set targets AUC mpnn_pos ala_scan
ab9 9 1.000 0.999 1.000
E 59 0.984 0.983 0.966
C 235 0.970 0.964 0.953

Side-by-side comparison with BLAST and k-mer baselines at docs/benchmark/CE_summary.md. Per-set ROC, PR, per-class plots at docs/benchmark/v3/, docs/benchmark/E_v1/, docs/benchmark/C_v1/.

Status

Research-grade. The CLI surface (flags, stdin/FASTA behaviour, exit codes) is stable; bundled models may be retrained and updated between 0.x releases. Not for operational use.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capiti-0.1.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capiti-0.1.0-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file capiti-0.1.0.tar.gz.

File metadata

  • Download URL: capiti-0.1.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb47c72107b49cf7a926af65c5b8405ed275c87412b3caf041613b76d8981a6e
MD5 43535ce765643a1187b619a886d1ab4d
BLAKE2b-256 e2a46454641677b3cb7cbe2baef7ae6a42c5f3d14e8d238646a89f2d0812b3c8

See more details on using hashes here.

File details

Details for the file capiti-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: capiti-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for capiti-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57f7b7ac93de51d98f94a2f87e5a18268feadd2a158466b65420111de03bae3e
MD5 ee73fca800b6877cb575f022dff29cb9
BLAKE2b-256 1748fdf215314015007b026287f49f7b14173e076b96e401008c514b6caa4465

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page