Skip to main content

A bioinf tool for analyzing pan-genome and other features based on synteny blocks

Project description

Badlon

Installation

Bablon can be installed with pip:

pip install badlon

Now you can run tool from any directory as badlon.

Pipeline Usage

Modules

Badlon includes multiple modules to process data. They can be listed with help command:

$ badlon --help
usage: badlon [-h] {prepare,analysis,match} ...

Tool for block based analysis of bacterial populations. Choose one of available modules.

positional arguments:
  {prepare,analysis,match}
    prepare             Prepare draft dataset for SibeliaZ.
    analysis            Analyze pan-genome and other block-based features based on synteny blocks.
    match               Performs matching of block and genes based on coordinates.

optional arguments:
  -h, --help            show this help message and exit

Here is recommended pipeline to process data with badlon:

Step 1: prepare data with PanACoTA pipeline

If you have genomes in some folder called some_folder (one file for genome), we suggest preparing data for badlon using PanACoTA pipeline.

To do so, you can use those commands:

1.1 Preparing data and tables with PanACoTA prepare module:

PanACoTA prepare --norefseq --min 0 --max 1 -o 1-prepare -d some_folder --cutn 125
  • --min 0 --max 1 are used to keep all genomes, parameter can be changed depending on task as well as all other parameters;
  • For check other parameters visit PanACoTA prepare documentation.

1.2 Annotating genomes with PanACoTA annotate module:

PanACoTA annotate --info 1-prepare/L* -r 2-annotate -n ESCO --threads 16
  • You can change label -n ESCO depending on your species (ESCO is for Escherichia coli);
  • For check parameters visit PanACoTA annotate documentation.

1.3 Calling orthology genes using PanACoTA pangenome module:

PanACoTA pangenome -l 2-annotate/LSTINFO-* -n ESCO -d 2-annotate/Proteins/ -o 3-pangenome
  • You can change -i which is minimum sequence identity to be considered in the same cluster (float between 0 and 1). Default is 0.8.
  • For check parameters visit PanACoTA pangenome documentation.

Step 2: Preparing data for alignment with badlon prepare module

Prepare module is used to prepare data for using SibeliaZ package keeping all necessary information: genome labels and chromosome numbers.

Parameters can be checked with help option:

$ badlon prepare --help
usage: badlon prepare [-h] --folder FOLDER [--contigs CONTIGS]
                      [--output OUTPUT]
                      [--annotate_subfolder ANNOTATE_SUBFOLDER]
                      [--min_len MIN_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --contigs CONTIGS, -c CONTIGS
                        Number of maximum contigs to take from every genome.
                        By default, keeps all.
  --output OUTPUT, -o OUTPUT
                        Output file path.
  --annotate_subfolder ANNOTATE_SUBFOLDER, -a ANNOTATE_SUBFOLDER
                        Subfolder of PanACoTA contains results of annotate
                        module. Used for finding LSTINFO file. Default is
                        '2-annotate'.
  --min_len MIN_LEN, -l MIN_LEN
                        Minimum contig length, less then that value will be
                        filtered. Default is 1000.

Required arguments:
  --folder FOLDER, -f FOLDER
                        Folder with PanACoTA output. Will be used to search
                        genome files based on LSTINFO file from annotate
                        module.

Example command:

badlon prepare -f 2-annotate -o for_sibeliaz.fna

Step 3: Obtaining blocks with SibeliaZ

3.1 Running SibeliaZ with recommended command based on badlon prepare output.

Example:

sibeliaz -k 15 -a 100 -n -t 32 -o sibeliaz_out for_sibeliaz.fna
  • Watch out -a it needs to be equal around number_of_genome * 20, badlon prepare calculates it automatically.

3.2 Obtaining blocks from alignment

Check recommended command from badlon prepare module output. Usually it's (blocks minimal size 3000):

cd sibeliaz_out
echo $'30 150\n100 500\n500 1500' > fine.txt
maf2synteny -s fine.txt -b 3000 blocks_coords.gff

Step 4: Calculating block based statistics and charts with badlon analysis module:

Parameters can be checked with help option:

$ badlon analysis --help
usage: badlon analysis [-h] --blocks_file BLOCKS_FILE --type {chr,contig}
                       [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Path to output folder. Default: blockomics_output.

Required arguments:
  --blocks_file BLOCKS_FILE, -b BLOCKS_FILE
                        Blocks resulted as output of original Sibelia or
                        maf2synteny tool. Usually it's
                        sibeliaz_out/3000/block_coords.txt file.
  --type {chr,contig}, -t {chr,contig}
                        Type of genome assembly, either 'chr' or 'contig'

Example command:

cd ..
badlon analysis -b sibeliaz_out/3000/blocks_coords.txt

Step 5 (optional): Match block and genes annotation with badlon match module

Parameters can be checked with help option:

$ badlon match --help
usage: badlon match [-h] --blocks_file BLOCKS_FILE --annotated_folder
                    ANNOTATED_FOLDER --pangenome_file PANGENOME_FILE --type
                    {chr,contig} [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Path to output folder. Default: blockomics_output.

Required arguments:
  --blocks_file BLOCKS_FILE, -b BLOCKS_FILE
                        Blocks folder resulted as output of original Sibelia
                        or maf2synteny tool. Usually it's `sibeliaz_out/3000/`
                        folder.
  --annotated_folder ANNOTATED_FOLDER, -a ANNOTATED_FOLDER
                        LSTINFO folder path, output of `annotate` step of
                        PanACoTA.
  --pangenome_file PANGENOME_FILE, -pg PANGENOME_FILE
                        File .lst with orthologous genes, output of
                        `pangenome` step of PanACoTA.
  --type {chr,contig}, -t {chr,contig}
                        Type of genome assembly, either 'chr' or 'contig'

Example command:

badlon match -b sibeliaz_out/3000/blocks_coords.txt -a 2-annotate/ -pg 3-pangenome/*.lst -t contig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badlon-0.1.3.tar.gz (15.3 kB view hashes)

Uploaded Source

Built Distribution

badlon-0.1.3-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page