Skip to main content

No project description provided

Project description

BABACHI: Backgroud Allelic Dosage Bayesian Checkpoint Identification

DOI
BABACHI is a tool for Background Allelic Dosage (BAD) genomic regions calling from non-phased heterozygous SNVs. It is aimed at estimation of BAD on low-coverage sequencing data, where the precise estimation of allelic copy numbers is not possible.

BAD corresponds to the ratio of Major copy number to Minor copy number.

BABACHI takes in a vcf-like .tsv file with heterozygous SNVs sorted by genome positions (ascending). The input file must contain the following first 7 columns: chromosome, position, ID, reference base, alternative base, reference read count, alternative read count All lines, starting with # are ignored.

The output is a .bed file with BAD annotations.

System Requirements

Hardware requirements

BABABCHI package requires only a standard computer with enough RAM to support the in-memory operations.

Software requirements

OS Requirements

The package can be installed on all major platforms (e.g. BSD, GNU/Linux, OS X, Windows) from Python Package Index (PyPI) and GitHub. The package has been tested on the following systems:

  • Windows: Windows 10
  • Linux: Ubuntu 18.04

Python Dependencies

BABACHI mainly depends on the following Python 3 packages:

docopt>=0.6.2
numpy>=1.18.0
schema>=0.7.2
contextlib2>=0.5.5
pandas>=1.0.4
matplotlib>=3.2.1
seaborn>=0.10.1

Installation

Install from PyPi

pip3 install babachi 

Install from Github

git clone https://github.com/autosome-ru/BABACHI
cd BABACHI
python3 setup.py install
  • sudo, if required The package should take less than 1 minute to install.

Requirements

python >= 3.6

Usage

babachi <options>...

To get full usage description one can execute:

babachi --help

This will produce the following message:

Usage:
    babachi <file> [-O <path> |--output <path>] [-q | --quiet] [--allele_reads_tr <int>] [--force-sort] [--visualize] [--boundary-penalty <float>] [--states <string>]
    babachi (--test) [-O <path> |--output <path>] [-q | --quiet] [--allele_reads_tr <int>] [--force-sort] [--visualize] [--boundary-penalty <float>]
    babachi visualize <file> (-b <badmap>| --badmap <badmap>) [-q | --quiet] [--allele_reads_tr <int>]
    babachi -h | --help

Arguments:
    <file>            Path to input file in tsv format with columns:
                      chr pos ID ref_base alt_base ref_read_count alt_read_count.
    <badmap>          Path to badmap .bed format file
    <int>             Non negative integer
    <float>           Non negative number
    <states_string>   String of states separated with "," (to provide fraction use "/", e.g. 4/3). Each state must be >= 1


Options:
    -h, --help                      Show help.
    -q, --quiet                     Less log messages during work time.
    -b <badmap>, --badmap <badmap>  Input badmap file
    -O <path>, --output <path>      Output directory or file path. [default: ./]
    --allele_reads_tr <int>         Allelic reads threshold. Input SNPs will be filtered by ref_read_count >= x and
                                    alt_read_count >= x. [default: 5]
    --force-sort                    Do chromosomes need to be sorted
    --visualize                     Perform visualization of SNP-wise AD and BAD for each chromosome.
                                    Will create a directory in output path for the .svg visualizations.
    --boundary-penalty <float>      Boundary penalty coefficient [default: 9]
    --states <states_string>        States string [default: 1,2,3,4,5,6,1.5,2.5]
    --test                          Run segmentation on test file

Demo

To perform a test run:

babachi --test

The test run takes approximately 2 minutes on a standard computer.
The result is a file named test.bed that will be produced in the root directory of the project (if -O option is not used). The contents of the test.bed file should start as follows:

#chr	start	end	BAD	Q1.00	Q1.33	Q1.50	Q2.00	Q2.50	Q3.00	Q4.00	Q5.00	Q6.00	SNP_count	sum_cover
chr1	1	125183196	2	-63.47825919524621	-24.598710473939718	-8.145646624117944	-2.000888343900442e-11	-30.773041699645546	-78.80480783186977	-189.88685134708248	-299.82657588596703	-401.6012195141575	1325	17280

Each row represents a single segment with a constant estimated BAD. The columns are as follows:

  • #chr: chromosome
  • start: segment start position
  • end: segment end position
  • BAD: estimated BAD
  • QX: the logarithmic likelyhood of the segment to have BAD = X
  • SNP_count: number of SNPs of the segment
  • sum_cover: the total read coverage of all SNPs of the segment

The BABACHI tool is maintained by Sergey Abramov and Alexandr Boytsov.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babachi-1.5.5.tar.gz (18.1 kB view hashes)

Uploaded Source

Built Distribution

babachi-1.5.5-py3-none-any.whl (736.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page