Automatic detection and subtyping of CRISPR-Cas operons
Project description
CasPredict
Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
Installation
Conda
It is advised to use miniconda or anaconda to install.
conda create -n caspredict -c conda-forge -c bioconda -c russel88 caspredict
pip
However, if you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, grep, sed) in your PATH you can install with pip
python -m pip install caspredict
Download database
Conda
Coming soon...
pip
Coming soon...
How to run
Activate environment
conda activate caspredict
Run with a nucleotide fasta as input
caspredict genome.fa my_output
Use multiple threads
caspredict genome.fa my_output -t 20
Output
- CRISPR_Cas.tab: CRISPR_Cas loci
- cas_operons.tab: All certain Cas operons
- crisprs_all.tab: All CRISPR arrays
- crisprs_orphan.tab: Orphan CRISPRs (those not in CRISPR_Cas.tab)
- cas_operons_orphan.tab: Orphan Cas operons (those not in CRISPR_Cas.tab)
- cas_operons_putative.tab: Putative Cas operons, mostly false positives, but also some ambiguous and partial systems
- spacers.fa: Fasta file with all spacer sequences
- hmmer.tab: All HMM vs. ORF matches, raw unfiltered results
- arguments.tab: File with arguments given to CasPredict
Check the different options
caspredict -h
usage: caspredict [-h] [-t THREADS] [--prodigal {single,meta}] [--aa] [--skip_check] [--keep_tmp] [--log_lvl {DEBUG,INFO,WARNING,ERROR}] [--redo_typing] [--db DB] [--dist DIST]
[--overall_eval OVERALL_EVAL] [--overall_cov_seq OVERALL_COV_SEQ] [--overall_cov_hmm OVERALL_COV_HMM] [--two_gene_eval TWO_GENE_EVAL] [--two_gene_cov_seq TWO_GENE_COV_SEQ]
[--two_gene_cov_hmm TWO_GENE_COV_HMM] [--single_gene_eval SINGLE_GENE_EVAL] [--single_gene_cov_seq SINGLE_GENE_COV_SEQ] [--single_cov_hmm SINGLE_COV_HMM] [--vf_eval VF_EVAL]
[--vf_cov_hmm VF_COV_HMM] [--ccd CCD] [--kmer KMER]
input output
positional arguments:
input Input fasta file
output Prefix for output directory
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of parallel processes [4].
--prodigal {single,meta}
Which mode to run prodigal in [single].
--aa Input is a protein fasta. Has to be in prodigal format.
--skip_check Skip check of input.
--keep_tmp Keep temporary files (prodigal, hmmer, minced).
--log_lvl {DEBUG,INFO,WARNING,ERROR}
Logging level [INFO].
--redo_typing Redo the typing. Skip prodigal and HMMER and load the hmmer.tab from the output dir.
data arguments:
--db DB Path to database.
cas threshold arguments:
--dist DIST Max allowed distance between genes in operon [3].
--overall_eval OVERALL_EVAL
Overall E-value threshold [0.001].
--overall_cov_seq OVERALL_COV_SEQ
Overall sequence coverage threshold [0.5].
--overall_cov_hmm OVERALL_COV_HMM
Overall HMM coverage threshold [0.5].
--two_gene_eval TWO_GENE_EVAL
Two-gene operon E-value threshold [1e-05].
--two_gene_cov_seq TWO_GENE_COV_SEQ
Two-gene operon sequence coverage threshold [0.8].
--two_gene_cov_hmm TWO_GENE_COV_HMM
Two-gene operon HMM coverage threshold [0.8].
--single_gene_eval SINGLE_GENE_EVAL
Lonely gene E-value threshold [1e-10].
--single_gene_cov_seq SINGLE_GENE_COV_SEQ
Lonely gene sequence coverage threshold [0.9].
--single_cov_hmm SINGLE_COV_HMM
Lonely gene HMM coverage threshold [0.9].
--vf_eval VF_EVAL V-F Cas12 specific E-value threshold [1e-75].
--vf_cov_hmm VF_COV_HMM
V-F Cas12 specific HMM coverage threshold [0.97].
crispr threshold arguments:
--ccd CCD Distance (bp) threshold to connect Cas operons and CRISPR arrays [10000.0].
--kmer KMER kmer size. Has to match training kmer size! [4].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
caspredict-0.3.0.tar.gz
(13.7 kB
view hashes)
Built Distribution
caspredict-0.3.0-py3.8.egg
(29.4 kB
view hashes)