Magnetosome Gene Clusters Analyzer
Project description
MagCluster
MagCluster is a tool for identification, annotation and visualization of magnetosome gene clusters (MGCs) from genomes of magnetotactic bacteria (MTB).
Table of Contents
Installation
MagCluster requires a working Conda installation.
Conda
MagCluster can be installed through conda. We recommend creating a new environment for MagCluster.
# Create magcluster environment
conda create -n magcluster
# Activate magcluster environment
conda activate magcluster
# Install MagCluster through bioconda channel
conda install -c conda-forge -c bioconda -c defaults magcluster
# Check for the usage of MagCluster
magcluster -h
Pip
Alternatively, you can install magcluster through pip in an existing environment. In this way, please make sure you have prokka installed.
# Install MagCluster through pip
pip install magcluster
# Check for the usage of MagCluster
magcluster -h
Usage
MagCluster comprises three modules for MGCs batch processing: (i) MTB genome annotation with Prokka (ii) MGCs screening with MGC_Screen (iii) MGCs mapping with Clinker
usage: magcluster [options]
Options:
{prokka,mgc_screen,clinker}
prokka Genome annotation with Prokka
mgc_screen Magnetosome gene cluster screening with MGC_Screen
clinker Magnetosome gene cluster mapping with Clinker
Genome annotation
MagCluster allows users to input multiple genome files or genome-containing folder(s) in one command for batch annotation. The general usage is same as Prokka yet some parameters are set with default value for MTB genome batch annotation.
To avoid confusion, the name of each genome is used as the output folder’s name (--outdir GENOME_NAME), output files’ prefix (--prefix GENOME_NAME), and GenBank file’s locus_tag (--locustag GENOME_NAME) by default. The ‘--compliant’ parameter is also used by default to ensure the standard GenBank files.
For MGCs annotation, we provide a reference MGCs file containing magnetosome protein sequences from representative MTB strains which is used by default. The value of '--evalue' is recommended to set to 1e-05.
example usage:
# MGCs annotation with multiple MTB genomes as input
$ magcluster prokka --evalue 1e-05 --proteins Magnetosome_protein_data.fasta MTB_genome1.fasta MTB_genome2.fasta MTB_genome3.fasta
# MGCs annotation with MTB genomes containing folder as input
$ magcluster prokka --evalue 1e-05 --proteins Magnetosome_protein_data.fasta /MTB_genomes_folder
MGCs screening
MGC_Screen module retrieves MGC-containing contigs/scaffolds in GenBank files. As magnetosome genes are always physically clustered in MTB genomes, MGC_Screen identify MGC based on the number of magnetosome genes gathered. Three parameters involved in MGC screening: '--contiglength', '--windowsize' and '--threshold' (see below). Users can adjust them according to needs. For each genome, MGC_Screen produces two files as output: a GenBank file of MGCs containing contigs and a csv file summarizing all magnetosome proteins sequences.
usage: magcluster mgc_screen [-h] [-l CONTIGLENGTH] [-win WINDOWSIZE] [-th THRESHOLD] [-o OUTDIR] gbkfile [gbkfile ...]
positional arguments:
gbkfile .gbk/.gbf files to analyzed. Multiple files or files-containing folder is acceptable.
optional arguments:
-h, --help show this help message and exit
-l CONTIGLENGTH, --contiglength CONTIGLENGTH
The minimum size of a contig for screening (default '2,000 bp')
-w WINDOWSIZE, --windowsize WINDOWSIZE
The window size in the text mining of magnetosome proteins (default '10,000 bp')
-th THRESHOLD, --threshold THRESHOLD
The minimum number of magnetosome genes existed in a window size (default '3')
-o OUTDIR, --outdir OUTDIR
Output folder (default 'mgc_screen')
example usage:
# MGCs screening with multiple GenBank files as input
$ magcluster mgc_screen --threshold 3 --contiglength 2000 --windowsize 10000 file1.gbk file2.gbk file3.gbk
# MGCs screening with GenBank files containing folder as input
$ magcluster mgc_screen --threshold 3 --contiglength 2000 --windowsize 10000 /gbkfiles_folder
MGCs alignment and mapping
We use Clinker for MGCs alignment and visualization. Note that the '-p' parameter is used by default to generate an interactive HTML web page where you can modify the MGCs figure and export it as publication-quality file.
example usage:
# MGCs screening with multiple GenBank files as input
$ magcluster clinker -p MGC_align.html /MGCs_files_folder/*.gbk
Citation
The manuscript is in preparation.
Contact us
If you have any questions or suggestions, feel free to contact us.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for magcluster-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b2f8485d6fa088791ca17c5fb9ecaaa3b2d5b54f3cd391e5508d1ea5aded2a6 |
|
MD5 | 3687c3b872ced4e54c23434e0b888590 |
|
BLAKE2b-256 | 6fbe002a4be71d3b42c0681d527e6a87f2996c775fb4d25b6058f3d1585a0e7b |