No project description provided
Project description
BABACHI: Backgroud Allelic Dosage Bayesian Checkpoint Identification
BABACHI is a tool for Background Allelic Dosage (BAD) genomic regions calling from
non-phased heterozygous SNVs. It is aimed at estimation of BAD on low-coverage sequencing data, where
the precise estimation of allelic copy numbers is not possible.
BAD corresponds to the ratio of Major copy number to Minor copy number.
BABACHI takes in a vcf-like .tsv file with heterozygous SNVs sorted by genome positions (ascending). The input file must contain the following first 7 columns: chromosome, position, ID, reference base, alternative base, reference read count, alternative read count All lines, starting with # are ignored.
The output is a .bed file with BAD annotations.
System Requirements
Hardware requirements
BABABCHI
package requires only a standard computer with enough RAM to support the in-memory operations.
Software requirements
OS Requirements
The package can be installed on all major platforms (e.g. BSD, GNU/Linux, OS X, Windows) from Python Package Index (PyPI) and GitHub. The package has been tested on the following systems:
- Windows: Windows 10
- Linux: Ubuntu 18.04
Python Dependencies
BABACHI
mainly depends on the following Python 3 packages:
docopt>=0.6.2
numpy>=1.18.0
schema>=0.7.2
contextlib2>=0.5.5
pandas>=1.0.4
matplotlib>=3.2.1
seaborn>=0.10.1
Installation
Install from PyPi
pip3 install babachi
Install from Github
git clone https://github.com/autosome-ru/BABACHI
cd BABACHI
python3 setup.py install
sudo
, if required The package should take less than 1 minute to install.
Requirements
python >= 3.6
Usage
babachi <options>...
To get full usage description one can execute:
babachi --help
This will produce the following message:
Usage:
babachi <file> [-O <path> |--output <path>] [-q | --quiet] [--allele_reads_tr <int>] [--force-sort] [--visualize] [--boundary-penalty <float>] [--states <string>]
babachi (--test) [-O <path> |--output <path>] [-q | --quiet] [--allele_reads_tr <int>] [--force-sort] [--visualize] [--boundary-penalty <float>]
babachi visualize <file> (-b <badmap>| --badmap <badmap>) [-q | --quiet] [--allele_reads_tr <int>]
babachi -h | --help
Arguments:
<file> Path to input file in tsv format with columns:
chr pos ID ref_base alt_base ref_read_count alt_read_count.
<badmap> Path to badmap .bed format file
<int> Non negative integer
<float> Non negative number
<states_string> String of states separated with "," (to provide fraction use "/", e.g. 4/3). Each state must be >= 1
Options:
-h, --help Show help.
-q, --quiet Less log messages during work time.
-b <badmap>, --badmap <badmap> Input badmap file
-O <path>, --output <path> Output directory or file path. [default: ./]
--allele_reads_tr <int> Allelic reads threshold. Input SNPs will be filtered by ref_read_count >= x and
alt_read_count >= x. [default: 5]
--force-sort Do chromosomes need to be sorted
--visualize Perform visualization of SNP-wise AD and BAD for each chromosome.
Will create a directory in output path for the .svg visualizations.
--boundary-penalty <float> Boundary penalty coefficient [default: 9]
--states <states_string> States string [default: 1,2,3,4,5,6,1.5,2.5]
--test Run segmentation on test file
Demo
To perform a test run:
babachi --test
The test run takes approximately 2 minutes on a standard computer.
The result is a file named test.bed
that will be produced in the root directory of the project (if -O
option is not used).
The contents of the test.bed
file should start as follows:
#chr start end BAD Q1.00 Q1.33 Q1.50 Q2.00 Q2.50 Q3.00 Q4.00 Q5.00 Q6.00 SNP_count sum_cover
chr1 1 125183196 2 -63.47825919524621 -24.598710473939718 -8.145646624117944 -2.000888343900442e-11 -30.773041699645546 -78.80480783186977 -189.88685134708248 -299.82657588596703 -401.6012195141575 1325 17280
Each row represents a single segment with a constant estimated BAD. The columns are as follows:
- #chr: chromosome
- start: segment start position
- end: segment end position
- BAD: estimated BAD
- QX: the logarithmic likelyhood of the segment to have BAD = X
- SNP_count: number of SNPs of the segment
- sum_cover: the total read coverage of all SNPs of the segment
The BABACHI tool is maintained by Sergey Abramov and Alexandr Boytsov.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.