In silico mining of encrypted antimicrobial peptides from proteomes

These details have not been verified by PyPI

Project links

Project description

decryptAMP

Bioinformatics tool for the identification and prediction of encrypted Antimicrobial Peptides (ecAMPs) from proteome data.

 ██████╗ ███████╗ ██████╗██████╗ ██╗   ██╗██████╗ ████████╗ █████╗ ███╗   ███╗██████╗ 
 ██╔══██╗██╔════╝██╔════╝██╔══██╗╚██╗ ██╔╝██╔══██╗╚══██╔══╝██╔══██╗████╗ ████║██╔══██╗
 ██║  ██║█████╗  ██║     ██████╔╝ ╚████╔╝ ██████╔╝   ██║   ███████║██╔████╔██║██████╔╝
 ██║  ██║██╔══╝  ██║     ██╔══██╗  ╚██╔╝  ██╔═══╝    ██║   ██╔══██║██║╚██╔╝██║██╔═══╝ 
 ██████╔╝███████╗╚██████╗██║  ██║   ██║   ██║        ██║   ██║  ██║██║ ╚═╝ ██║██║     
 ╚═════╝ ╚══════╝ ╚═════╝╚═╝  ╚═╝   ╚═╝   ╚═╝        ╚═╝   ╚═╝  ╚═╝╚═╝     ╚═╝╚═╝

decryptAMP is an end-to-end pipeline that mines proteomes for encrypted antimicrobial peptides (ecAMPs). It performs in silico proteolytic digestion, computes 22 physicochemical and compositional descriptors per peptide, and classifies each peptide using AMPidentifier (a tuned soft-voting ensemble of five base classifiers). All results are saved with a complete provenance manifest (JSON) and a self-contained HTML report.

Quick start
Pipeline overview
Installation
Usage
- Command-line interface
- Examples with bacteria.faa
Output layout
Scientific notes
AMPidentifier models
Troubleshooting
Testing
Citation

Quick start

pip install decryptamp

# Run on the bundled E. coli K-12 MG1655 demo proteome (4298 proteins)
decryptamp

Outputs land in results/bacteria/:

results/bacteria/
├── encrypted_peptides_results.csv             # ecAMP candidates with 22 features + probability
├── encrypted_peptides_results_manifest.json   # full provenance (versions, hashes, counts, parameters)
├── encrypted_peptides_results_report.html     # human-readable summary
└── encrypted_peptides_results_dedup_stats.txt # deduplication breakdown

results/bacteria.zip                           # compressed archive of the run directory

Pipeline overview

proteome FASTA
      │
      ▼  in silico digestion (trypsin / chymotrypsin / caspase / pseudoenzyme)
encrypted peptides (8-50 aa, canonical residues only)
      │
      ▼  22 physicochemical + compositional descriptors (AMPidentifier)
feature matrix
      │
      ▼  exact deduplication (always) + optional CD-HIT clustering
unique encrypted peptides
      │
      ▼  AMPidentifier classifier (voting / rf / svm / gb / xgb / lgbm)
ecAMP candidates above the decision threshold

Installation

PyPI

The recommended way to install decryptAMP. Python ≥ 3.10 is required.

pip install decryptamp

The package includes all AMPidentifier model weights (~63 MB). No additional downloads are needed.

Optional: install cd-hit for sequence-identity deduplication (--dedup-cdhit):

brew install cd-hit               # macOS
sudo apt-get install cd-hit       # Debian/Ubuntu
conda install -c bioconda cd-hit  # conda

Local install from source

Requirements: Python ≥ 3.10, optional cd-hit binary for --dedup-cdhit.

git clone https://github.com/madsondeluna/decryptAMP.git
cd decryptAMP
python -m venv venv
source venv/bin/activate          # Linux/macOS  (Windows: venv\Scripts\activate)
pip install .

Optional, for CD-HIT clustering:

brew install cd-hit               # macOS
sudo apt-get install cd-hit       # Debian/Ubuntu
conda install -c bioconda cd-hit  # conda

Tested versions

The bundled AMPidentifier weights were generated and validated against the package versions below. pyproject.toml declares minimum constraints for installation flexibility, but if a .pkl fails to deserialize or numeric results drift unexpectedly, pin to these exact versions:

Package	Version
Python	3.13.7
biopython	1.86
joblib	1.5.2
lightgbm	4.6.0
modlamp	4.3.2
numpy	2.3.4
pandas	2.3.3
scikit-learn	1.8.0
scipy	1.16.3
tqdm	4.67.1
xgboost	3.2.0
pytest (dev only)	9.0.3

Quick install of the exact tested set:

pip install \
  biopython==1.86 joblib==1.5.2 lightgbm==4.6.0 modlamp==4.3.2 \
  numpy==2.3.4 pandas==2.3.3 scikit-learn==1.8.0 scipy==1.16.3 \
  tqdm==4.67.1 xgboost==3.2.0

Native shell alias (no Docker)

After pip install . inside a virtual environment, the decryptamp entry point is placed at <venv>/bin/decryptamp. To call it from any directory without activating the environment, add an alias pointing to that binary. Adjust the path to where you cloned the repository.

# Linux/macOS (zsh)
echo "alias decryptamp='/abs/path/to/decryptAMP/venv/bin/decryptamp'" >> ~/.zshrc
source ~/.zshrc

# Linux/macOS (bash)
echo "alias decryptamp='/abs/path/to/decryptAMP/venv/bin/decryptamp'" >> ~/.bashrc
source ~/.bashrc

After that, the tool is available everywhere:

cd /any/working/dir
decryptamp --input myproteome.faa --high-discovery-mode
# Output goes to ./results/myproteome/ in the current working directory.

Docker

The bundled Dockerfile is multi-stage, slim, and includes cd-hit and the AMPidentifier model weights.

docker build -t decryptamp .

Run the demo proteome (results land in ./results on the host):

docker run --rm -v "$PWD/results:/work/results" decryptamp

Run on your own proteome (mounted read-only):

docker run --rm \
  -v "/abs/path/to/proteomes:/data:ro" \
  -v "$PWD/results:/work/results" \
  decryptamp --input /data/myproteome.faa --model voting --high-discovery-mode

Pass any decryptAMP flag after the image name; it is forwarded directly to the decryptamp entry point.

Docker shell alias

For an experience identical to the native install, add an alias that bind-mounts the current working directory as /data inside the container. Both the input FASTA and the results/ output directory then resolve transparently to your host CWD.

# Linux/macOS (zsh)
echo "alias decryptamp='docker run --rm -v \"\$PWD:/data\" -w /data decryptamp'" >> ~/.zshrc
source ~/.zshrc

# Linux/macOS (bash)
echo "alias decryptamp='docker run --rm -v \"\$PWD:/data\" -w /data decryptamp'" >> ~/.bashrc
source ~/.bashrc

Use exactly like the native command:

cd /any/working/dir
decryptamp --input myproteome.faa --high-discovery-mode
# Output appears in ./results/myproteome/ on the host.

This alias keeps containers ephemeral (--rm) and produces no Docker-specific footprint in the output directory; files end up owned by your host user on macOS and Linux.

Usage

Command-line interface

usage: decryptamp [-h] [--input FASTA] [--output NAME] [--results-dir DIR]
                  [--force] [--workers N]
                  [--enzyme {trypsin,chymotrypsin,caspase,pseudoenzyme}]
                  [--model {voting,rf,svm,gb,xgb,lgbm}] [--threshold FLOAT]
                  [--high-discovery-mode] [--no-prediction]
                  [--dedup-cdhit FLOAT] [--keep-redundant] [--list-thresholds]

Mine encrypted antimicrobial peptides (ecAMPs) from proteome data.

input / output:
  --input FASTA         proteome FASTA (default: bundled E. coli demo)
  --output NAME         output CSV name or explicit path
  --results-dir DIR     parent dir for run outputs (default: results)
  --force               overwrite the run directory if it exists
  --workers N           parallel worker processes (default: 8)

digestion:
  --enzyme              cleavage rule (default: trypsin)

prediction:
  --model               AMPidentifier model (default: voting)
  --threshold FLOAT     decision threshold (default: per-model MCC-optimized)
  --high-discovery-mode override threshold to 0.9 (high precision)
  --no-prediction       skip prediction; save all unique peptides with features only

deduplication:
  --dedup-cdhit FLOAT   optional CD-HIT clustering at this identity (e.g. 0.95)
  --keep-redundant      also save the pre-deduplication CSV

utilities:
  --list-thresholds     print per-model MCC thresholds and exit

Run `decryptamp --help` to see the live grouped help in your terminal (with
ANSI colours when stdout is a TTY).

Flag	Default	Description
`--input PATH`	bundled E. coli demo	Input proteome FASTA. Aborts with a clear error if the file looks like nucleotide data (>90% A/C/G/T/U/N). Reports duplicate IDs and suffixes them with `__dup1`, `__dup2`, etc., without losing data.
`--output NAME`	`encrypted_peptides_results.csv`	Output CSV name. If it has no path separator, the file is placed inside the run directory (see `--results-dir`). If it contains a path (e.g. `/tmp/x.csv`), the path is respected literally.
`--results-dir DIR`	`results`	Parent directory for run outputs. A subdirectory named after the input filename (without FASTA extension) is created inside.
`--force`	off	Overwrite the run directory if it already exists. Without this flag, decryptAMP aborts with a clear error to prevent accidental data loss.
`--workers N`	`os.cpu_count()`	Parallel worker processes for digestion.
`--enzyme {trypsin,chymotrypsin,caspase,pseudoenzyme}`	`trypsin`	In silico cleavage rule. See Scientific notes for the regex of each enzyme.
`--model {voting,rf,svm,gb,xgb,lgbm}`	`voting`	AMPidentifier model. The voting ensemble (Acc=92.9%, AUC=0.977, MCC=0.859 on validation) is recommended.
`--threshold FLOAT`	per-model MCC-optimized	Decision threshold for `ecAMP_Probability`. If omitted, uses the AMPidentifier MCC-optimized threshold for the selected model (e.g. 0.56 for voting).
`--high-discovery-mode`	off	Override the threshold with the high-precision discovery setting (0.9). Reduces false positives at the cost of recall. Calibrated for voting; emits a warning when used with other models. Ignored if `--threshold` is given explicitly.
`--no-prediction`	off	Skip the AMPidentifier prediction step. Saves all unique encrypted peptides with their 22 features only.
`--dedup-cdhit FLOAT`	off	Apply CD-HIT clustering at the given identity threshold (e.g. `0.95`) after exact deduplication. Requires the `cd-hit` binary in `PATH`.
`--keep-redundant`	off	Also save the pre-deduplication CSV (one row per peptide occurrence) as `<output>_redundant.csv`.
`--list-thresholds`	off	Print the per-model MCC-optimized threshold table and exit without running the pipeline.

Examples with bacteria.faa

The bundled demo proteome (bacteria.faa) is a 4298-protein RefSeq proteome of Escherichia coli str. K-12 substr. MG1655. Numbers below are reproducible with the default seeds and the AMPidentifier weights shipped in this repository.

1. Default run (trypsin + voting + MCC threshold)

decryptamp

Run directory: /abs/path/results/bacteria
Selected enzyme for digestion: Trypsin
Loading proteome from: /path/to/decryptamp/example-data/bacteria.faa
Successfully loaded 4298 protein sequences (1330117 aa total).
  Organism (consensus): Escherichia coli str. K-12 substr. MG1655
  Source database: RefSeq
Computing AMPidentifier features for 257845 peptides...
Deduplicating 257845 encrypted peptides...
  Exact dedup: 257845 -> 251756 (2.36% reduction).
Predicting AMP activity with AMPidentifier (VOTING)...
AMPidentifier model loaded: VOTING (threshold=0.56, 22 features).
Found 25784 ecAMPs (out of 251756 unique encrypted peptides) with ecAMP_Probability >= 0.56.

metric	value
Proteins input	4 298
Encrypted peptides generated	257 845
After exact deduplication	251 756
ecAMPs predicted (threshold 0.56)	25 784
Yield per protein	6.00
Yield per kb of proteome	19.39

2. High-precision discovery (threshold 0.9)

decryptamp --high-discovery-mode

Use this when downstream synthesis or screening is expensive and you want to triage the highest-confidence candidates only. The voting ensemble shifts from MCC=0.56 to a fixed 0.9 cutoff.

metric	default (0.56)	--high-discovery-mode (0.9)
ecAMPs predicted	25 784	2 711
Yield per protein	6.00	0.63
Yield per kb of proteome	19.39	2.04

3. Use a single base classifier instead of the ensemble

decryptamp --model rf --threshold 0.7

Available models with their MCC-optimized thresholds:

Model	MCC-optimized threshold	Notes
`voting`	0.56	Soft-voting ensemble (recommended)
`rf`	0.56	Random Forest
`svm`	0.47	Support Vector Machine (RBF)
`gb`	0.55	Gradient Boosting
`xgb`	0.48	XGBoost
`lgbm`	0.71	LightGBM

4. Try a different enzyme

decryptamp --enzyme chymotrypsin
decryptamp --enzyme caspase            # cleaves after D (aspartic acid)
decryptamp --enzyme pseudoenzyme       # random control, fixed seed=42

The pseudoenzyme setting generates non-overlapping fragments of length sampled uniformly from [8, 50] using a fixed-seed RNG (seed=42) for reproducibility. It serves as a negative control to demonstrate that biological enzyme cleavage is non-random.

5. Remove near-duplicate peptides with CD-HIT

decryptamp --dedup-cdhit 0.95

After exact deduplication, near-duplicates differing in 1-2 residues (e.g. missed-cleavage variants of the same core) are collapsed at the given identity threshold. Output gains Cluster_ID, Cluster_Size, and Cluster_Members columns. Typical reduction on bacterial proteomes is 60-80% at 0.95 identity.

6. Audit redundancy before deduplication

decryptamp --dedup-cdhit 0.95 --keep-redundant

Adds <output>_redundant.csv with one row per peptide occurrence (before any dedup), useful for tracing each ecAMP back to all source proteins and start positions.

7. Skip prediction (feature-only mode)

decryptamp --no-prediction

Computes the 22 features for every unique encrypted peptide and saves them without filtering. Useful for downstream analyses (PCA, UMAP, clustering, custom classifiers).

8. Override output destination

decryptamp --output /tmp/my_results.csv

When --output contains a path separator, the run directory is not managed automatically. Sibling artifacts (manifest, HTML report, dedup stats) are written next to the CSV.

9. Multiple proteomes side by side

decryptamp --input proteomes/Ecoli.faa
decryptamp --input proteomes/Athaliana.faa
decryptamp --input proteomes/Hsapiens.faa

Each produces its own subdirectory under results/ (Ecoli/, Athaliana/, Hsapiens/), so multiple proteomes coexist without overwriting each other.

10. Override the parent results directory

decryptamp --input data/myproteome.faa --results-dir /scratch/runs --force

Useful in HPC setups where outputs should land outside the working directory.

11. Full feature combination on bacteria.faa

A reference command exercising every flag at once. Useful as a smoke test of a fresh installation.

decryptamp \
    --output ecoli_k12_full.csv \
    --results-dir results \
    --force \
    --workers 8 \
    --enzyme trypsin \
    --model voting \
    --high-discovery-mode \
    --dedup-cdhit 0.95 \
    --keep-redundant

This will generate, inside results/bacteria/:

ecoli_k12_full.csv                   # high-confidence ecAMPs with 22 features
ecoli_k12_full.fasta                 # same candidates as FASTA, score in header
ecoli_k12_full_manifest.json         # full provenance
ecoli_k12_full_report.html           # one-page HTML summary
ecoli_k12_full_dedup_stats.txt       # exact + CD-HIT 0.95 breakdown
ecoli_k12_full_redundant.csv         # pre-deduplication CSV (one row per occurrence)

Expected (rounded) on the bundled E. coli K-12 MG1655 demo:

stage	count
Input proteins	4 298
Encrypted peptides generated	257 845
After exact deduplication	251 756
After CD-HIT @ 0.95	~50-80 thousand
ecAMPs (voting + threshold 0.9)	a few hundred to ~1 thousand

Output layout

By default every run creates results/<input_stem>/:

results/<input_stem>/
├── encrypted_peptides_results.csv             # main output, full feature table (always)
├── encrypted_peptides_results.fasta           # ecAMP sequences with score in header (always)
├── encrypted_peptides_results_manifest.json   # full provenance JSON (always)
├── encrypted_peptides_results_report.html     # self-contained HTML report (always)
├── encrypted_peptides_results_dedup_stats.txt # dedup breakdown (always)
├── encrypted_peptides_results_failed.csv      # only if any peptide was dropped
└── encrypted_peptides_results_redundant.csv   # only if --keep-redundant

results/<input_stem>.zip                       # compressed archive of the run directory (always)

The FASTA file is ready for downstream tools (alignment, BLAST, structure prediction) and for synthesis ordering. Header format:

>ecAMP_000001 ecAMP_score=0.9876 source=NP_414543.1:682 multiplicity=1 length=11
KLLILARETGR
>ecAMP_000002 ecAMP_score=0.9742 source=NP_414544.1:35 multiplicity=3 length=18
KWKLFKKIEKVGQNVRDG

The main CSV contains, for each ecAMP candidate:

Column	Meaning
`Peptide`	amino-acid sequence (8-50 aa, canonical residues only)
`Length`	number of residues
`Multiplicity`	number of times this peptide was generated across the proteome
`Source_Proteins`	semicolon-separated list of source protein IDs
`Source_Positions`	parallel list of 1-based start positions
`Cluster_ID`	CD-HIT cluster ID (only if `--dedup-cdhit` was used)
`Cluster_Size`	number of peptides in the cluster (only with `--dedup-cdhit`)
`Cluster_Members`	semicolon-separated peptide sequences in the cluster
`Charge`, `pI`, `InstabilityInd`, ...	the 22 AMPidentifier features
`ecAMP_Probability`	model probability of being an ecAMP (range 0-1)
`ecAMP_Prediction`	binary call (1 if probability ≥ threshold, else 0)

The 22 features

Group	Count	Names
Global descriptors (modlAMP)	6	`Charge`, `pI`, `InstabilityInd`, `AliphaticInd`, `BomanInd`, `HydrophRatio`
Hydrophobic moment (modlAMP, Eisenberg, angle 100°)	1	`HydrophobicMoment`
Grouped amino-acid composition	9	`f_acidic`, `f_basic`, `f_polar`, `f_nonpolar`, `f_aliphatic`, `f_aromatic`, `f_charged`, `f_small`, `f_tiny`
Free Energy of Transition local (D1)	3	`FET_low_D1`, `FET_mid_D1`, `FET_high_D1`
Solvent accessibility local (D1)	3	`SA_buried_D1`, `SA_exposed_D1`, `SA_inter_D1`

Charges are computed at pH 7.0 with amide=True (matching the AMPidentifier training convention).

Manifest JSON

Every run writes a complete _manifest.json covering tool version, git commit, full command line, input file SHA-256, proteome organism and source database (extracted from FASTA headers), digestion parameters, feature parameters, deduplication statistics, model SHA-256, decision threshold and its source (mcc-optimized / high-discovery / user-override / deprecated-min-prob), and SHA-256 of every output artifact.

A typical pipeline_summary block:

{
  "n_proteins_input": 4298,
  "n_encrypted_peptides_generated": 257845,
  "n_encrypted_peptides_dropped_nonfinite": 0,
  "n_encrypted_peptides_after_exact_dedup": 251756,
  "n_encrypted_peptides_after_cdhit": null,
  "n_ecamps_predicted": 25784,
  "ecamps_yield_per_protein": 5.999069,
  "ecamps_yield_per_kb_proteome": 19.386758
}

The manifest is sufficient to bit-identically reproduce the run from the same input.

HTML report

A self-contained HTML page (no JavaScript, no external resources, plain CSS) is written next to every CSV. It renders the manifest as a one-page summary with KPI cards, the proteome → encrypted-peptides → unique → ecAMPs flow, organism and source-database metadata extracted from the FASTA, and tables for every parameter used. Suitable for sharing with collaborators or attaching to a manuscript as supplementary material.

Open with any browser:

open results/bacteria/encrypted_peptides_results_report.html

The CSV and JSON outputs are structured as direct inputs for ecAMPdb, an open database of encrypted antimicrobial peptides covering organisms from all six kingdoms and viruses.

Scientific notes

Canonical residues only. Peptides containing any residue outside the 20 canonical amino acids (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y) are discarded silently during digestion. This avoids the silent feature bias that arises when ambiguous codes (X, B, Z, J, U, O) are substituted with arbitrary canonical residues.

Enzymatic cleavage rules.

Enzyme	Regex	Description
`trypsin`	`(?<=[RK])(?!P)`	Cleaves after R or K, not before P
`chymotrypsin`	`(?<=[FWY])(?!P)`	Cleaves after F, W, Y, not before P
`caspase`	`(?<=D)`	Cleaves after any D (aspartic acid)
`pseudoenzyme`	random, seed=42	Negative control: uniform random fragmentation

Length filter. Generated peptides are kept only if 8 ≤ length ≤ 50 (configurable in src/decryptamp/config.py).

Missed cleavages. Up to 2 missed cleavages allowed by default (configurable in src/decryptamp/config.py).

Charge calculation. pH=7.0, amide=True. The amidation flag matches the AMPidentifier training convention; many natural AMPs (defensins, magainins, cecropins) are C-terminally amidated in vivo, which adds +1 to the net charge.

Hydrophobic moment. Computed with the Eisenberg scale and a 100° angle (canonical α-helix amphipathicity).

Failure handling. Peptides whose feature vector contains any NaN or Inf value are dropped before classification and logged to <output>_failed.csv. The classifier itself raises ValueError if NaN/Inf reaches it (defense in depth). Zero-vectors are never silently fed to the model.

Reproducibility. All randomness is seeded (pseudoenzyme: 42). Per-model MCC-optimized thresholds are loaded from src/ampidentifier/models/threshold_<model>.txt. The manifest records SHA-256 of input, model file, and output CSV.

AMPidentifier models

The classifier is the bundled AMPidentifier (vendored under src/ampidentifier/). The voting ensemble is a soft average of five base learners, each tuned via 5-fold StratifiedKFold and RandomizedSearchCV (n_iter=50, scoring='roc_auc').

Model	Accuracy	AUC-ROC	MCC	Notes
Voting (default)	92.9%	0.977	0.859	Soft-voting ensemble of the five below
Random Forest	91.9%	0.972	0.839
Support Vector Machine (RBF)	91.9%	0.969	0.839	Uses `StandardScaler`
Gradient Boosting	92.0%	0.974	0.839
XGBoost	92.2%	0.974	0.843
LightGBM	92.7%	0.975	0.855

Metrics computed on a 20% holdout of the AMPidentifier training set (13 246 peptides total, balanced 6 623 AMP / 6 623 non-AMP).

Troubleshooting

Error: 'X.faa' looks like a nucleotide sequence — The input FASTA contains too many A/C/G/T/U/N residues to be a protein. Translate it first (e.g. Prodigal, six-frame translation) or pass a protein FASTA.

Error: run directory '...' already exists and is not empty — Pass --force to overwrite, --results-dir to write elsewhere, or --output PATH (with separators) to fully override.

cd-hit binary not found in PATH — Install CD-HIT (brew install cd-hit, apt-get install cd-hit, conda install -c bioconda cd-hit) or omit --dedup-cdhit.

AmpPredictor received N rows with NaN/Inf in feature columns — A feature calculation produced non-finite values for some peptides. The orchestrator should have dropped them upstream; this error indicates a bug. Check <output>_failed.csv for context and please open an issue.

Warning: --high-discovery-mode applies a fixed threshold of 0.9 calibrated for the voting ensemble — You combined --high-discovery-mode with a non-voting model. The 0.9 cutoff is calibrated for voting; per-model probability distributions differ. For per-model calibrated cutoffs use --threshold explicitly.

Sklearn version warning when loading models — The bundled .pkl files were trained with scikit-learn ≥ 1.8.0. Older versions still load but may produce slightly different numeric results in edge cases. pip install --upgrade scikit-learn to silence.

Testing

A pytest suite covers the scientific contract of the digestion module, the AMP classifier input validation, and the manifest schema. The default invocation runs only the fast unit tests; opt-in flags expand coverage.

pip install ".[dev]"   # only needs pytest

# Default: fast unit tests, no model loading (~3 s, 49 tests)
pytest

# Add the slow tests that load the AMPidentifier weights (~30 s)
pytest --run-slow

# Full suite, including end-to-end runs against bacteria.faa (~10 min)
pytest --run-all

Test layout:

File	Coverage	Marker
`tests/test_peptide_processor.py`	enzyme regexes (trypsin/chymotrypsin/caspase), canonical-AA filter, pseudoenzyme determinism, missed cleavages, 1-based positions	none (fast)
`tests/test_amp_predictor.py`	NaN/Inf input validation, missing feature columns, MCC threshold values per model	mostly fast; model-loading tests marked `@slow`
`tests/test_manifest.py`	JSON schema completeness, SHA-256 validity, `--no-prediction` handling	none (fast)

Citation

If decryptAMP supports your research, please cite:

Luna-Aragão, M. A., da Silva, R. L., Santos, D. E., Pacífico, J., & Benko-Iseppon, A. M. decryptAMP: A bioinformatics tool for the identification and prediction of encrypted Antimicrobial Peptides (ecAMPs) from proteome data.

Repository: https://github.com/madsondeluna/decryptAMP

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decryptamp-2.1.0.tar.gz (17.3 MB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

decryptamp-2.1.0-py3-none-any.whl (17.8 MB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file decryptamp-2.1.0.tar.gz.

File metadata

Download URL: decryptamp-2.1.0.tar.gz
Upload date: May 15, 2026
Size: 17.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for decryptamp-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e6a35babc6297d425b96f22581735f48ca0d6942f0046759bfbff168f4f78cde`
MD5	`ce09b971ea458afb5f78db17395365d5`
BLAKE2b-256	`cc602f830755d31c2346933dcd733f76d5f16eb669292e28a63a6fdb07a142b7`

See more details on using hashes here.

File details

Details for the file decryptamp-2.1.0-py3-none-any.whl.

File metadata

Download URL: decryptamp-2.1.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 17.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for decryptamp-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ad69ce63f37ab8a1f30c34493b4a77ce3d70895efc7800d7dcca09fb6ef50c95`
MD5	`64e09a8eb5e5708423af7b7998e891d7`
BLAKE2b-256	`e941c3a9d83105a11a444f9912f56b40b7467da2472612c9ec4fb7b5607cf709`

See more details on using hashes here.

decryptamp 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

decryptAMP

Table of contents

Quick start

Pipeline overview

Installation

PyPI

Local install from source

Tested versions

Native shell alias (no Docker)

Docker

Docker shell alias

Usage

Command-line interface

Examples with bacteria.faa

1. Default run (trypsin + voting + MCC threshold)

2. High-precision discovery (threshold 0.9)

3. Use a single base classifier instead of the ensemble

4. Try a different enzyme

5. Remove near-duplicate peptides with CD-HIT

6. Audit redundancy before deduplication

7. Skip prediction (feature-only mode)

8. Override output destination

9. Multiple proteomes side by side

10. Override the parent results directory

11. Full feature combination on bacteria.faa

Output layout

The 22 features

Manifest JSON

HTML report

Scientific notes

AMPidentifier models

Troubleshooting

Testing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes