Skip to main content

Magnetosome gene cluster annotation, screening and mapping tool

Project description

MagCluster

MagCluster is a tool for annotating, screening and mapping magnetosome gene clusters (MGCs) from genomes of magnetotactic bacteria (MTB).

Installation

MagCluster requires a working Conda installation. We recommend creating a new environment for the magcluster release being installed through conda.

wget https://github.com/RunJiaJi/magcluster/releases/download/0.1.6/magcluster-0.1.6.yml

conda env create -n magcluster --file magcluster-0.1.6.yml

# OPTIONAL CLEANUP
rm magcluster-0.1.6.yml

Alternatively, you can install magcluster through pip in an existing environment. In this way, please make sure you have prokka installed.

#Prokka installation
conda install -c conda-forge -c bioconda -c defaults prokka
pip install magcluster

Usage

MagCluster comprises three modules for MGCs batch processing: (i) MTB genome annotation with Prokka (ii) magnetosome gene cluster screening with Mgc_Screen (iii) MGCs mapping with Clinker

usage: magcluster [options]

Options:
  {prokka,mgc_screen,clinker}
    prokka              Genome annotation with Prokka
    mgc_screen          Magnetosome gene cluster screening with magscreen
    clinker             Magnetosome gene cluster mapping with Clinker

MTB genome annotation

MagCluster allows users to input multiple genome files or genome-containing folder(s) in one command for batch annotation. The general usage is same as Prokka yet some parameters are set with default value for MTB genome batch annotation.

To avoid confusion, the name of each genome is used as the output folder’s name (--outdir GENOME_NAME), output files’ prefix (--prefix GENOME_NAME), and GenBank file’s locus_tag (--locustag GENOME_NAME) by default. The ‘--compliant’ parameter is also used by default to ensure the standard GenBank files.

For MGCs annotation, we provide a reference MGCs file containing magnetosome protein sequences of novel MTB strains which is highly recommended to use with ‘--proteins’ parameter. The value of '--evalue' is recommended to set to 1e-05.

example usage: 

# MGCs annotation with multiple MTB genomes as input
$ magcluster prokka --evalue 1e-05 --proteins Magnetosome_protein_data.fasta MTB_genome1.fasta MTB_genome2.fasta MTB_genome3.fasta

# MGCs annotation with MTB genomes containing folder as input
$ magcluster prokka --evalue 1e-05 --proteins Magnetosome_protein_data.fasta /MTB_genomes_folder

MGCs screening

Mgc_Screen module retrieves MGC_containing contigs in GenBank files. As magnetosome genes are always physically clustered on genome, Mgc_Screen identify MGC based on the number of magnetosome genes gathered. Three parameters involved in MGC screening: '--minlength', '--maxlength', '--threshold' (see below). Users can adjust them according to needs. Mgc_screen produces two files as output: a GenBank file of MGCs containing contigs and a csv file summarizing all magnetosome proteins sequences.

usage: magcluster mgc_screen [-h] [-th THRESHOLD] [-o OUTDIR] [-min MINLENGTH] [-max MAXLENGTH] gbkfile [gbkfile ...]

positional arguments:
  gbkfile               .gbk files to analyzed. Multiple files or files-containing folder is acceptable.

optional arguments:
  -h, --help            show this help message and exit
  -th THRESHOLD, --threshold THRESHOLD
                        The minimum number of magnetosome genes in one contig/scaffold to screen (default '2')
  -o OUTDIR, --outdir OUTDIR
                        Output folder (default 'mgc_screen')
  -min MINLENGTH, --minlength MINLENGTH
                        Minimum length of contigs to be considered (default '2000bp')
  -max MAXLENGTH, --maxlength MAXLENGTH
                        Maximum length of contigs containing magnetosome gene (default '10000bp')
example usage: 

# MGCs screening with multiple GenBank files as input
$ magcluster mgc_screen -th 3 -min 2000 -max 10000 file1.gbk file2.gbk file3.gbk

# MGCs screening with GenBank files containing folder as input
$ magcluster mgc_screen -th 3 -min 2000 -max 10000 /gbkfiles_folder

MGCs alignment and mapping

We use Clinker for MGCs alignment and visualization. We recommend using the '-p' parameter to generate an interactive html web page where you can modify the MGCs figure and export it as publication-quality file.

example usage: 

# MGCs screening with multiple GenBank files as input
$ magcluster clinker -o MGCs_alignment_result -p /MGCs_files_folder/*.gbk

Citation

The manuscript is in preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magcluster-0.1.6.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magcluster-0.1.6-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file magcluster-0.1.6.tar.gz.

File metadata

  • Download URL: magcluster-0.1.6.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for magcluster-0.1.6.tar.gz
Algorithm Hash digest
SHA256 4408c07c5948e1fef29d87bbecc5cc578dc3d460be8defcf1fc2c6e61e001089
MD5 077af5adf07d8770d7b3a4e6f35f3f99
BLAKE2b-256 8a6922442ce30ed1e5420c72894eab46258c94e15b44e2d3cc853c370c627d16

See more details on using hashes here.

File details

Details for the file magcluster-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: magcluster-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for magcluster-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 374568d8c2c16155fcb783e9cc07895e9b68dd99e2b9de51e9c07a45f5a60c80
MD5 813f48f40cc6b7f04b56b4812b6ed1ba
BLAKE2b-256 076995de0de1377eedb58d9f68ff1297effeb5d9c6bd08968b5c454470cb8e44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page