Automatic detection and subtyping of CRISPR-Cas operons
Project description
CasPredict
Detect CRISPR-Cas genes, group them into operons, and predict their subtype
Installation
Conda
It is advised to use miniconda or anaconda to install.
conda create -n caspredict -c conda-forge -c bioconda -c russel88 caspredict
pip
However, if you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, grep, sed) in your PATH you can install with pip
python -m pip install caspredict
Download database
Conda
Coming soon...
pip
Coming soon...
How to run
Activate environment
conda activate caspredict
Run with a nucleotide fasta as input
caspredict genome.fa my_output
Use multiple threads
caspredict genome.fa my_output -t 20
Check the different options
caspredict -h
usage: caspredict [-h] [-t THREADS] [--prodigal {single,meta}] [--aa] [--skip_check] [--keep_prodigal] [--log_lvl {DEBUG,INFO,WARNING,ERROR}] [--redo_typing] [--scores SCORES] [--hmms HMMS]
[--dist DIST] [--overall_eval OVERALL_EVAL] [--overall_cov_seq OVERALL_COV_SEQ] [--overall_cov_hmm OVERALL_COV_HMM] [--two_gene_eval TWO_GENE_EVAL]
[--two_gene_cov_seq TWO_GENE_COV_SEQ] [--two_gene_cov_hmm TWO_GENE_COV_HMM] [--single_gene_eval SINGLE_GENE_EVAL] [--single_gene_cov_seq SINGLE_GENE_COV_SEQ]
[--single_cov_hmm SINGLE_COV_HMM] [--vf_eval VF_EVAL] [--vf_cov_hmm VF_COV_HMM]
input output
positional arguments:
input Input fasta file
output Prefix for output directory
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of parallel processes [4].
--prodigal {single,meta}
Which mode to run prodigal in [single].
--aa Input is a protein fasta. Has to be in prodigal format.
--skip_check Skip check of input.
--keep_prodigal Keep prodigal output.
--log_lvl {DEBUG,INFO,WARNING,ERROR}
Logging level [INFO].
--redo_typing Redo the typing. Skip prodigal and HMMER and load the hmmer.tab from the output dir.
data arguments:
--scores SCORES Path to CasScoring table.
--hmms HMMS Path to directory with HMM profiles.
threshold arguments:
--dist DIST Max allowed distance between genes in operon [3].
--overall_eval OVERALL_EVAL
Overall E-value threshold [0.001].
--overall_cov_seq OVERALL_COV_SEQ
Overall sequence coverage threshold [0.5].
--overall_cov_hmm OVERALL_COV_HMM
Overall HMM coverage threshold [0.5].
--two_gene_eval TWO_GENE_EVAL
Two-gene operon E-value threshold [1e-05].
--two_gene_cov_seq TWO_GENE_COV_SEQ
Two-gene operon sequence coverage threshold [0.8].
--two_gene_cov_hmm TWO_GENE_COV_HMM
Two-gene operon HMM coverage threshold [0.8].
--single_gene_eval SINGLE_GENE_EVAL
Lonely gene E-value threshold [1e-10].
--single_gene_cov_seq SINGLE_GENE_COV_SEQ
Lonely gene sequence coverage threshold [0.9].
--single_cov_hmm SINGLE_COV_HMM
Lonely gene HMM coverage threshold [0.9].
--vf_eval VF_EVAL V-F Cas12 specific E-value threshold [1e-75].
--vf_cov_hmm VF_COV_HMM
V-F Cas12 specific HMM coverage threshold [0.97].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
caspredict-0.2.1.tar.gz
(10.1 kB
view hashes)
Built Distribution
caspredict-0.2.1-py3.8.egg
(19.0 kB
view hashes)