A bioinf tool for analyzing pan-genome and other features based on synteny blocks

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Badlon

Installation

Bablon can be installed with pip:

pip install badlon

Now you can run tool from any directory as badlon.

Pipeline Usage

Modules

Badlon includes multiple modules to process data. They can be listed with help command:

$ badlon --help
usage: badlon [-h] {prepare,analysis,match} ...

Tool for block based analysis of bacterial populations. Choose one of available modules.

positional arguments:
  {prepare,analysis,match}
    prepare             Prepare draft dataset for SibeliaZ.
    analysis            Analyze pan-genome and other block-based features based on synteny blocks.
    match               Performs matching of block and genes based on coordinates.

optional arguments:
  -h, --help            show this help message and exit

Here is recommended pipeline to process data with badlon:

Step 1: prepare data with `PanACoTA` pipeline

If you have genomes in some folder called some_folder (one file for genome), we suggest preparing data for badlon using PanACoTA pipeline.

To do so, you can use those commands:

1.1 Preparing data and tables with `PanACoTA prepare` module:

PanACoTA prepare --norefseq --min 0 --max 1 -o 1-prepare -d some_folder --cutn 125

--min 0 --max 1 are used to keep all genomes, parameter can be changed depending on task as well as all other parameters;
For check other parameters visit PanACoTA prepare documentation.

1.2 Annotating genomes with `PanACoTA annotate` module:

PanACoTA annotate --info 1-prepare/L* -r 2-annotate -n ESCO --threads 16

You can change label -n ESCO depending on your species (ESCO is for Escherichia coli);
For check parameters visit PanACoTA annotate documentation.

1.3 Calling orthology genes using `PanACoTA pangenome` module:

PanACoTA pangenome -l 2-annotate/LSTINFO-* -n ESCO -d 2-annotate/Proteins/ -o 3-pangenome

You can change -i which is minimum sequence identity to be considered in the same cluster (float between 0 and 1). Default is 0.8.
For check parameters visit PanACoTA pangenome documentation.

Step 2: Preparing data for alignment with `badlon prepare` module

Prepare module is used to prepare data for using SibeliaZ package keeping all necessary information: genome labels and chromosome numbers.

Parameters can be checked with help option:

$ badlon prepare --help
usage: badlon prepare [-h] --folder FOLDER [--contigs CONTIGS]
                      [--output OUTPUT]
                      [--annotate_subfolder ANNOTATE_SUBFOLDER]
                      [--min_len MIN_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --contigs CONTIGS, -c CONTIGS
                        Number of maximum contigs to take from every genome.
                        By default, keeps all.
  --output OUTPUT, -o OUTPUT
                        Output file path.
  --annotate_subfolder ANNOTATE_SUBFOLDER, -a ANNOTATE_SUBFOLDER
                        Subfolder of PanACoTA contains results of annotate
                        module. Used for finding LSTINFO file. Default is
                        '2-annotate'.
  --min_len MIN_LEN, -l MIN_LEN
                        Minimum contig length, less then that value will be
                        filtered. Default is 1000.

Required arguments:
  --folder FOLDER, -f FOLDER
                        Folder with PanACoTA output. Will be used to search
                        genome files based on LSTINFO file from annotate
                        module.

Example command:

badlon prepare -f 2-annotate -o for_sibeliaz.fna

Step 3: Obtaining blocks with SibeliaZ

3.1 Running SibeliaZ with recommended command based on `badlon prepare` output.

Example:

sibeliaz -k 15 -a 100 -n -t 32 -o sibeliaz_out for_sibeliaz.fna

Watch out -a it needs to be equal around number_of_genome * 20, badlon prepare calculates it automatically.

3.2 Obtaining blocks from alignment

Check recommended command from badlon prepare module output. Usually it's (blocks minimal size 3000):

cd sibeliaz_out
echo $'30 150\n100 500\n500 1500' > fine.txt
maf2synteny -s fine.txt -b 3000 blocks_coords.gff

Step 4: Calculating block based statistics and charts with `badlon analysis` module:

Parameters can be checked with help option:

$ badlon analysis --help
usage: badlon analysis [-h] --blocks_file BLOCKS_FILE --type {chr,contig}
                       [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Path to output folder. Default: blockomics_output.

Required arguments:
  --blocks_file BLOCKS_FILE, -b BLOCKS_FILE
                        Blocks resulted as output of original Sibelia or
                        maf2synteny tool. Usually it's
                        sibeliaz_out/3000/block_coords.txt file.
  --type {chr,contig}, -t {chr,contig}
                        Type of genome assembly, either 'chr' or 'contig'

Example command:

cd ..
badlon analysis -b sibeliaz_out/3000/blocks_coords.txt

Step 5 (optional): Match block and genes annotation with `badlon match` module

Parameters can be checked with help option:

$ badlon match --help
usage: badlon match [-h] --blocks_file BLOCKS_FILE --annotated_folder
                    ANNOTATED_FOLDER --pangenome_file PANGENOME_FILE --type
                    {chr,contig} [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Path to output folder. Default: blockomics_output.

Required arguments:
  --blocks_file BLOCKS_FILE, -b BLOCKS_FILE
                        Blocks folder resulted as output of original Sibelia
                        or maf2synteny tool. Usually it's `sibeliaz_out/3000/`
                        folder.
  --annotated_folder ANNOTATED_FOLDER, -a ANNOTATED_FOLDER
                        LSTINFO folder path, output of `annotate` step of
                        PanACoTA.
  --pangenome_file PANGENOME_FILE, -pg PANGENOME_FILE
                        File .lst with orthologous genes, output of
                        `pangenome` step of PanACoTA.
  --type {chr,contig}, -t {chr,contig}
                        Type of genome assembly, either 'chr' or 'contig'

Example command:

badlon match -b sibeliaz_out/3000/blocks_coords.txt -a 2-annotate/ -pg 3-pangenome/*.lst -t contig

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.3

May 4, 2023

0.1.2

Jul 11, 2022

0.1.1

Jul 11, 2022

0.1

Jul 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badlon-0.1.3.tar.gz (15.3 kB view hashes)

Uploaded May 4, 2023 Source

Built Distribution

badlon-0.1.3-py3-none-any.whl (18.0 kB view hashes)

Uploaded May 4, 2023 Python 3

Hashes for badlon-0.1.3.tar.gz

Hashes for badlon-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`a87030952929782a398776b15ad2a8f007bb4d5e81f6cd8d1454cf1340df1582`
MD5	`b10e177bbea34336eef0ef4a84fa37a9`
BLAKE2b-256	`1d6c3b9767ff6d2c68a88c1653f9b1f1ad921e95631f97f735b6b9ba0ba5e936`

Hashes for badlon-0.1.3-py3-none-any.whl

Hashes for badlon-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2f6308a9f30e01baba8841191f3bb95fd5fa4a33daf4705d3d984cfcd090c8b`
MD5	`baf63b190bdb59babffaeae810cde811`
BLAKE2b-256	`85adb4ecfdf141a5e5780a010a5e42c11cb5e9d439562ac75fa4da4fa8a0c622`

badlon 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Badlon

Installation

Pipeline Usage

Modules

Step 1: prepare data with `PanACoTA` pipeline

1.1 Preparing data and tables with `PanACoTA prepare` module:

1.2 Annotating genomes with `PanACoTA annotate` module:

1.3 Calling orthology genes using `PanACoTA pangenome` module:

Step 2: Preparing data for alignment with `badlon prepare` module

Step 3: Obtaining blocks with SibeliaZ

3.1 Running SibeliaZ with recommended command based on `badlon prepare` output.

3.2 Obtaining blocks from alignment

Step 4: Calculating block based statistics and charts with `badlon analysis` module:

Step 5 (optional): Match block and genes annotation with `badlon match` module

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

badlon 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Badlon

Installation

Pipeline Usage

Modules

Step 1: prepare data with PanACoTA pipeline

1.1 Preparing data and tables with PanACoTA prepare module:

1.2 Annotating genomes with PanACoTA annotate module:

1.3 Calling orthology genes using PanACoTA pangenome module:

Step 2: Preparing data for alignment with badlon prepare module

Step 3: Obtaining blocks with SibeliaZ

3.1 Running SibeliaZ with recommended command based on badlon prepare output.

3.2 Obtaining blocks from alignment

Step 4: Calculating block based statistics and charts with badlon analysis module:

Step 5 (optional): Match block and genes annotation with badlon match module

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Step 1: prepare data with `PanACoTA` pipeline

1.1 Preparing data and tables with `PanACoTA prepare` module:

1.2 Annotating genomes with `PanACoTA annotate` module:

1.3 Calling orthology genes using `PanACoTA pangenome` module:

Step 2: Preparing data for alignment with `badlon prepare` module

3.1 Running SibeliaZ with recommended command based on `badlon prepare` output.

Step 4: Calculating block based statistics and charts with `badlon analysis` module:

Step 5 (optional): Match block and genes annotation with `badlon match` module