A parsing tool for AMP tools.
Project description
AMPcombi : AntiMicrobial Peptides parsing and functional classification tool
This tool parses the results of antimicrobial peptide (AMP) prediction tools into a single table and aligns the hits against a reference AMP database for functional classifications.
For parsing: AMPcombi is developed to parse the output of these AMP prediction tools:
Tool | Version | Link |
---|---|---|
Ampir | 1.1.0 | https://github.com/Legana/ampir |
AMPlify | 1.0.3 | https://github.com/bcgsc/AMPlify |
Macrel | 1.1.0 | https://github.com/BigDataBiology/macrel |
HMMsearch | 3.3.2 | https://github.com/EddyRivasLab/hmmer |
EnsembleAMPpred | - | https://pubmed.ncbi.nlm.nih.gov/33494403/ |
NeuBI | - | https://github.com/nafizh/NeuBI |
For classification: AMPcombi is developed to offer functional annotation of the detected AMPs by alignment to an AMP reference databases, for e.g.,:
Tool | Version | Link |
---|---|---|
DRAMP | 3.0 | https://github.com/CPU-DRAMP/DRAMP-3.0 |
Alignment to the reference database is done using diamond blastp v.2.0.15
======================
Installation
======================
To install AMPcombi:
Add dependencies of the tool; python
> 3.0, biopython
, pandas
and diamond
.
Installation can be done using:
- pip installation
pip install AMPcombi
- git repository
git clone https://github.com/Darcy220606/AMPcombi.git
- conda
conda env create -f ampcombi/environment.yml
or
conda install -c bioconda AMPcombi
======================
Usage:
======================
There are two basic commands to run AMPcombi:
- Using
--amp_results
ampcombi \
--amp_results path/to/my/result_folder/ \
--faa_folder path/to/sample_faa_files/
Here the head folder containing output files has to be given. AMPcombi finds and summarizes the output files from different tools, if the folder is structured and named as: /result_folder/toolsubdir/samplesubdir/sample.tool.filetype
.
- Note that the filetype ending might vary and can be specified with
--tooldict
, if it is different from the default. When passing a dictionary via command line, this has to be done as a string with single quotes' '
and the dictionary keys and items with double quotes" "
. i.e.'{"key1":"item1", "key2":"item2"}'
The path to the folder containing the respective protein fasta files has to be provided with --faa_folder
. The files have to be named with <samplename>.faa
.
Structure of the results folder:
amp_results/
├── tool_1/
| ├── sample_1/
| | └── sample_1.tool_1.tsv
| └── sample_2/
| | └── sample_2.tool_1.tsv
├── tool_2/
| ├── sample_1/
| | └── sample_1.tool_2.txt
| └── sample_2/
| | └── sample_2.tool_2.txt
├── tool_3/
├── sample_1/
| └── sample_1.tool_3.predict
└── sample_2/
└── sample_2.tool_3.predict
- Using
--path_list
and--sample_list
ampcombi \
--path_list path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv \
--path_list path_to_sample_2_tool_1.csv path_to_sample_2_tool_1.csv \
--sample_list sample_1 sample_2 \
--faa_folder path/to/sample_faa_files/
Here the paths to the output-files to be summarized can be given by --path_list
for each sample. Together with this option a list of sample-names has to be supplied.
The path to the folder containing the respective protein fasta files has to be provided with --faa_folder
. The files have to be named with <samplename>.faa
.
Input options:
command | definition | default | example |
---|---|---|---|
--amp_results | path to the folder containing different tool's output files | ./test_files/ | ../amp_results/ |
--sample_list | list of samples' names | - | sample_1 sample_2 |
--path_list | list of paths to output files | - | path_to_sample_1_tool_1.csv path_to_sample_1_tool_1.csv |
--cutoff | probability cutoff to filter AMPs | 0 | 0.5 |
--faa_folder | path to the folder containing the samples` .faa files, Filenames have to contain the corresponding sample-name, i.e. sample_1.faa | ./test_faa/ | ./faa_files/ |
--tooldict | dictionary of AMP-tools and their respective output file endings | '{"ampir":"ampir.tsv", "amplify":"amplify.tsv", "macrel":"macrel.tsv", "hmmer_hmmsearch":"hmmsearch.txt", "ensembleamppred":"ensembleamppred.txt"}' | - |
--amp_database | path to the folder containing the reference database files: (1) a fasta file with <.fasta> file extension and (2) the corresponding table with with functional and taxonomic classifications in <.tsv> file extension | DRAMP 'general amps' database | ./amp_ref_database/ |
--complete_summary | Concatenates all samples' summarized tables into one | False | True |
--log | print messages into log file instead of stdout | False | True |
--version | print the version number into stdout | - | 0.1.4 |
- Note: The fasta file corresponding to the AMP database should not contain any characters other than ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
- Note: The reference database table should be tab delimited.
Output:
The output will be written into your working directory, containing the following files and folders:
<pwd>/
├── amp_ref_database/
| ├── amp_ref.dmnd
| ├── general_amps_<DATE>_clean.fasta
| └── general_amps_<DATE>.tsv
├── sample_1/
| ├── sample_1_amp.faa
| ├── sample_1_ampcombi.csv
| └── sample_1_diamond_matches.txt
├── sample_2/
| ├── sample_2_amp.faa
| ├── sample_2_ampcombi.csv
| └── sample_2_diamond_matches.txt
├── AMPcombi_summary.csv
└── ampcombi.log
======================
Contribution:
======================
AMPcombi is a tool developed for parsing results from published AMP prediction tools. We therefore welcome fellow contributors who would like to add new AMP prediction tools results for parsing and alignment.
Adding a new tool to AMPcombi
In ampcombi/reformat_tables.py
- add a new tool function to read the output to a pandas dataframe and return two columns named
contig_id
andprob_<toolname>
- add the new function to the
read_path
function
In ampcombi/main.py
- add your default
tool:tool.fileending
to the default of--tooldict
======================
Authors: @louperelo and @darcy220606
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file AMPcombi-0.1.4.tar.gz
.
File metadata
- Download URL: AMPcombi-0.1.4.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34cda9120574be48e993437fe4f8390b8a45bea8337286c45d2fb5d02ba5ddfc |
|
MD5 | cab767f59486ce5dca5576db9d9bef59 |
|
BLAKE2b-256 | 2f5b27fabaf64a3f4b3692b91355051e71379e4d1993a62ec190f2d6a2c0d490 |