Tiny protein-function classifier distilled from ProteinMPNN, for edge deployment.
Project description
capiti
Tiny protein-function classifier for edge deployment. Given a nucleotide sequence encoding a protein, capiti flags whether the encoded protein is expected to retain the enzymatic function of one of a small reference set.
Weighs ~1 MB on disk, runs inference in tens of milliseconds on a Raspberry Pi. Trained by distilling ProteinMPNN's function-preserving design prior into a small 1D CNN.
Overview (each ResidualDilatedBlock collapsed to one box):
Inside one ResidualDilatedBlock:
See docs/capiti.summary.txt for the full
per-layer size / FLOP table.
Install
pip install capiti
Use
capiti ATGCGTAAAGTGGCC... # prints TRUE or FALSE (default set ab9)
capiti ATGCGT... --cutoff 0.8 -v # TRUE p_inset=0.995
capiti --fasta seqs.fa # batch over a FASTA
echo ATGCGT... | capiti --stdin
Reference sets
capiti ships three bundled reference sets, selectable at invocation
time via --set NAME (or CAPITI_SET).
| set | targets | description |
|---|---|---|
ab9 |
9 | Beta-lactamases relevant to antibiotic resistance plus other soluble enzymes. Default. |
E |
59 | Larger enzyme panel (54 PDB + 5 AlphaFold-only entries). |
C |
235 | Broad enzyme panel sourced from PDB. |
capiti ATGCGT... --set ab9
capiti --fasta seqs.fa --set C
CAPITI_SET=E capiti --stdin
Inference-time gate
Capiti pairs the CNN with a SIFTS-backed fixed-position gate by
default: if the model picks a target Ti and the query has a mutated
residue at any of Ti's catalytic / active-site positions, the in-set
score is forced to 0. This catches single-residue active-site
knockouts the masked-mean CNN under-weights. Disable with --no-gate.
Exit code is 0 on TRUE, 1 on FALSE, suitable for shell pipelines:
capiti ATGCGT... && echo "in set" || echo "not in set"
Benchmarks
On the held-out test split for each set (gate on, natural threshold):
| set | targets | AUC | mpnn_pos | ala_scan |
|---|---|---|---|---|
| ab9 | 9 | 1.000 | 0.999 | 1.000 |
| E | 59 | 0.984 | 0.983 | 0.966 |
| C | 235 | 0.970 | 0.964 | 0.953 |
Side-by-side comparison with BLAST and k-mer baselines at
docs/benchmark/CE_summary.md. Per-set
ROC, PR, per-class plots at docs/benchmark/v3/,
docs/benchmark/E_v1/,
docs/benchmark/C_v1/.
Status
Research-grade. The CLI surface (flags, stdin/FASTA behaviour, exit codes) is stable; bundled models may be retrained and updated between 0.x releases. Not for operational use.
License
MIT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file capiti-0.1.0.tar.gz.
File metadata
- Download URL: capiti-0.1.0.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb47c72107b49cf7a926af65c5b8405ed275c87412b3caf041613b76d8981a6e
|
|
| MD5 |
43535ce765643a1187b619a886d1ab4d
|
|
| BLAKE2b-256 |
e2a46454641677b3cb7cbe2baef7ae6a42c5f3d14e8d238646a89f2d0812b3c8
|
File details
Details for the file capiti-0.1.0-py3-none-any.whl.
File metadata
- Download URL: capiti-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57f7b7ac93de51d98f94a2f87e5a18268feadd2a158466b65420111de03bae3e
|
|
| MD5 |
ee73fca800b6877cb575f022dff29cb9
|
|
| BLAKE2b-256 |
1748fdf215314015007b026287f49f7b14173e076b96e401008c514b6caa4465
|