A bioinf tool for analyzing pan-genome and other features based on synteny blocks
Project description
Badlon
Installation
Bablon can be installed with pip:
pip install badlon
Now you can run tool from any directory as badlon.
Pipeline Usage
Modules
Badlon includes multiple modules to process data. They can be listed with help command:
$ badlon --help
usage: badlon [-h] {prepare,analysis,match} ...
Tool for block based analysis of bacterial populations. Choose one of available modules.
positional arguments:
{prepare,analysis,match}
prepare Prepare draft dataset for SibeliaZ.
analysis Analyze pan-genome and other block-based features based on synteny blocks.
match Performs matching of block and genes based on coordinates.
optional arguments:
-h, --help show this help message and exit
Here is recommended pipeline to process data with badlon:
Step 1: prepare data with PanACoTA pipeline
If you have genomes in some folder called some_folder (one file for genome), we suggest preparing data for badlon using PanACoTA pipeline.
To do so, you can use those commands:
1.1 Preparing data and tables with PanACoTA prepare module:
PanACoTA prepare --norefseq --min 0 --max 1 -o 1-prepare -d some_folder --cutn 125
--min 0 --max 1are used to keep all genomes, parameter can be changed depending on task as well as all other parameters;- For check other parameters visit
PanACoTA preparedocumentation.
1.2 Annotating genomes with PanACoTA annotate module:
PanACoTA annotate --info 1-prepare/L* -r 2-annotate -n ESCO --threads 16
- You can change label
-n ESCOdepending on your species (ESCO is for Escherichia coli); - For check parameters visit
PanACoTA annotatedocumentation.
1.3 Calling orthology genes using PanACoTA pangenome module:
PanACoTA pangenome -l 2-annotate/LSTINFO-* -n ESCO -d 2-annotate/Proteins/ -o 3-pangenome
- You can change
-iwhich is minimum sequence identity to be considered in the same cluster (float between 0 and 1). Default is 0.8. - For check parameters visit
PanACoTA pangenomedocumentation.
Step 2: Preparing data for alignment with badlon prepare module
Prepare module is used to prepare data for using SibeliaZ package keeping all necessary information: genome labels and chromosome numbers.
Parameters can be checked with help option:
$ badlon prepare --help
usage: badlon prepare [-h] --folder FOLDER [--contigs CONTIGS]
[--output OUTPUT]
[--annotate_subfolder ANNOTATE_SUBFOLDER]
[--min_len MIN_LEN]
optional arguments:
-h, --help show this help message and exit
--contigs CONTIGS, -c CONTIGS
Number of maximum contigs to take from every genome.
By default, keeps all.
--output OUTPUT, -o OUTPUT
Output file path.
--annotate_subfolder ANNOTATE_SUBFOLDER, -a ANNOTATE_SUBFOLDER
Subfolder of PanACoTA contains results of annotate
module. Used for finding LSTINFO file. Default is
'2-annotate'.
--min_len MIN_LEN, -l MIN_LEN
Minimum contig length, less then that value will be
filtered. Default is 1000.
Required arguments:
--folder FOLDER, -f FOLDER
Folder with PanACoTA output. Will be used to search
genome files based on LSTINFO file from annotate
module.
Example command:
badlon prepare -f 2-annotate -o for_sibeliaz.fna
Step 3: Obtaining blocks with SibeliaZ
3.1 Running SibeliaZ with recommended command based on badlon prepare output.
Example:
sibeliaz -k 15 -a 100 -n -t 32 -o sibeliaz_out for_sibeliaz.fna
- Watch out
-ait needs to be equal aroundnumber_of_genome * 20,badlon preparecalculates it automatically.
3.2 Obtaining blocks from alignment
Check recommended command from badlon prepare module output. Usually it's (blocks minimal size 3000):
cd sibeliaz_out
echo $'30 150\n100 500\n500 1500' > fine.txt
maf2synteny -s fine.txt -b 3000 blocks_coords.gff
Step 4: Calculating block based statistics and charts with badlon analysis module:
Parameters can be checked with help option:
$ badlon analysis --help
usage: badlon analysis [-h] --blocks_file BLOCKS_FILE --type {chr,contig}
[--output OUTPUT]
optional arguments:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
Path to output folder. Default: blockomics_output.
Required arguments:
--blocks_file BLOCKS_FILE, -b BLOCKS_FILE
Blocks resulted as output of original Sibelia or
maf2synteny tool. Usually it's
sibeliaz_out/3000/block_coords.txt file.
--type {chr,contig}, -t {chr,contig}
Type of genome assembly, either 'chr' or 'contig'
Example command:
cd ..
badlon analysis -b sibeliaz_out/3000/blocks_coords.txt
Step 5 (optional): Match block and genes annotation with badlon match module
Parameters can be checked with help option:
$ badlon match --help
usage: badlon match [-h] --blocks_file BLOCKS_FILE --annotated_folder
ANNOTATED_FOLDER --pangenome_file PANGENOME_FILE --type
{chr,contig} [--output OUTPUT]
optional arguments:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
Path to output folder. Default: blockomics_output.
Required arguments:
--blocks_file BLOCKS_FILE, -b BLOCKS_FILE
Blocks folder resulted as output of original Sibelia
or maf2synteny tool. Usually it's `sibeliaz_out/3000/`
folder.
--annotated_folder ANNOTATED_FOLDER, -a ANNOTATED_FOLDER
LSTINFO folder path, output of `annotate` step of
PanACoTA.
--pangenome_file PANGENOME_FILE, -pg PANGENOME_FILE
File .lst with orthologous genes, output of
`pangenome` step of PanACoTA.
--type {chr,contig}, -t {chr,contig}
Type of genome assembly, either 'chr' or 'contig'
Example command:
badlon match -b sibeliaz_out/3000/blocks_coords.txt -a 2-annotate/ -pg 3-pangenome/*.lst -t contig
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file badlon-0.1.4.tar.gz.
File metadata
- Download URL: badlon-0.1.4.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3993c6638aaedfbcc466f047d439a0ebc8db31655c5c9162005c5a8478cd3bd
|
|
| MD5 |
e5296c24316ef7c52004e3d6ab721357
|
|
| BLAKE2b-256 |
ec104cedbfee6d60a8d72dcb64454dd10a08d2ce135aa9bc5cc284b1b07233d0
|
File details
Details for the file badlon-0.1.4-py3-none-any.whl.
File metadata
- Download URL: badlon-0.1.4-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a522ca9f2490f8d9d83c7989df8e08d1578692a8f8c605c8b2945a053aa705f3
|
|
| MD5 |
0a66713ca507f1a2e8c38e19505e950e
|
|
| BLAKE2b-256 |
38ead7b25b04e59613e486d68f454dd64c30b217effee9f9cdcc2e441291e4f2
|