Skip to main content

No project description provided

Project description

BABACHI: Background Allelic Dosage Bayesian Checkpoint Identification

DOI
BABACHI is a tool for estimation of relative Background Allelic Dosage (BAD) from non-phased heterozygous SNVs. It estimates BAD directly from enriched sequencing data, where the precise estimation of allelic copy numbers is not possible.

BAD corresponds to the ratio of Major allele copy number to Minor allele copy number. More details and algorithm description are available here

Files format

BABACHI accepts either a BED file with heterozygous SNVs or a standard VCF file sorted by genomic positions (ascending).

  • a BED file should begin with the following 8 columns; additional columns are permitted and ignored: chromosome, start, end, ID, reference base, alterntive base, reference read count, alternative read count, sample_id.
    Lines starting with # are ignored.

  • a VCF file should have GT and AD fields.

(!) We suggest to use only common SNPs for BAD estimation. User is expected to filter common variants prior to BABACHI usage. See here for more details.

The output is a BED file with BAD annotations. The file format is described in the Demo section

System Requirements

Hardware requirements

BABACHI package requires only a standard computer with enough RAM to support in-memory operations.

Software requirements

OS Requirements

The package can be installed on GNU/Linux and OS X platforms from Python Package Index (PyPI) and GitHub. The package has been tested on the following systems:

  • MacOS Monterey v12.5
  • Linux: Ubuntu 18.04

Python Dependencies

BABACHI mainly depends on the following Python 3 packages:

docopt>=0.6.2
numpy>=1.18.0
schema>=0.7.2
contextlib2>=0.5.5
pandas>=1.0.4
matplotlib>=3.2.1
seaborn>=0.10.1
numba>=0.53.1

Installation

Install from PyPi

pip3 install babachi

Install from Github

git clone https://github.com/autosome-ru/BABACHI
cd BABACHI
python3 setup.py install

or

pip3 install 'babachi @ git+https://github.com/autosome-ru/BABACHI.git'
  • sudo, if required. The package should take less than 1 minute to install.

Requirements

python >= 3.6

Usage

babachi <options>...

To get full usage description one can execute:

babachi --help

This will produce the following message:

Usage:
    babachi (<file> | --test) [options]
    babachi visualize <file> (-b <badmap>| --badmap <badmap>) [options]
    babachi filter <file> [options]

Arguments:
    <file>            Path to input VCF file. Expected to be sorted by (chr, pos)

    <path>            Path to the file
    <int>             Non negative integer
    <float>           Non negative number
    <states-string>   String of states separated with "," (to provide fraction use "/", e.g. 4/3).
                      Each state must be >= 1
    <samples-string>  Comma-separated sample names or indices
    <prior-string>    Either "uniform" or "geometric"
    <file-or-link>    Path to existing file or link


Arguments:
    -h, --help                              Show help
    --version                               Show version

    -O <path>, --output <path>              Output directory or file path. [default: ./]
    --test                                  Run segmentation on test file

    -v, --verbose                           Write debug messages
    --sample-list <samples-string>          Comma-separated sample names or integer indices to use from input VCF
    --snp-strategy <snp-strategy>           Strategy for the SNPs at the same genomic position (from different samples).
                                            Either add read counts 'ADD' or treat as separate events 'SEP'. [default: SEP]

    -n, --no-filter                         Skip filtering of input file
    --filter-no-rs                          Filter variants without assigned ID in VCF file.
    -f, --force-sort                        Chromosomes in output file will be sorted in numerical order
    -j <int>, --jobs <int>                  Number of jobs to use, parallel by chromosomes [default: 1]
    --chrom-sizes <file-or-link>            File with chromosome sizes (can be a link), default is hg38
    -a <int>, --allele-reads-tr <int>       Allelic reads threshold. Input SNPs will be filtered by ref_read_count >= x and
                                            alt_read_count >= x. Required for correct estimations in underlying statistical model [default: 5]
    -p <string>, --prior <prior-string>     Prior to use. Can be either uniform or geometric [default: uniform]
    -g <float>, --geometric-prior <float>   Coefficient for geometric prior [default: 0.98]
    -s <string>, --states <states-string>   States string [default: 1,2,3,4,5,6]

    -B <float>, --boundary-penalty <float>  Boundary penalty coefficient [default: 4]
    -Z <int>, --min-seg-snps <int>          Only allow segments containing Z or more unique SNPs (IDs/positions) [default: 3]
    -R <int>, --min-seg-bp <int>            Only allow segments containing R or more base pairs [default: 1000]
    -P <int>, --post-segment-filter <int>   Remove segments with less than P unique SNPs (IDs/positions) from output [default: 0]
    -A <int>, --atomic-region-size <int>    Atomic region size in # of SNPs [default: 600]
    -C <int>, --chr-min-snps <int>          Minimum number of SNPs at a chromosome to start segmentation [default: 100]
    -S <int>, --subchr-filter <int>         Exclude subchromosomes with less than C unique SNPs  [default: 3]

Visualization:
    -b <path>, --badmap <path>              BADmap file created with BABACHI
    --visualize                             Perform visualization of SNP-wise AD and BAD for each chromosome.
                                            Will create a directory in output path for the <ext> visualizations
    -z, --zip                               Zip visualizations directory
    -e <ext>, --ext <ext>                   Extension to save visualizations with [default: svg]

Demo

To perform a test run:

babachi --test

The test run takes approximately 2 minutes on a standard computer.
The result is a file named test.bed that will be created in the working directory (if -O option was not provided). The contents of the test.bed file should have the following format:

#chr	start	end	BAD	Q1.00	Q1.33	Q1.50	Q2.00	Q2.50	Q3.00	Q4.00	Q5.00	Q6.00	SNP_count	sum_cover
chr1	1	125183196	2	-63.47825919524621	-24.598710473939718	-8.145646624117944	-2.000888343900442e-11	-30.773041699645546	-78.80480783186977	-189.88685134708248	-299.82657588596703	-401.6012195141575	1325	17280

Each row represents a single segment with a constant estimated BAD. The most important columns are:

  • #chr: chromosome
  • start: segment start position
  • end: segment end position
  • BAD: estimated BAD score

Additional columns:

  • SNP_count: number of SNPs in the segment
  • SNP_ID_count: number of unique SNPs in the segment
  • sum_cover: the total read coverage of all SNPs of the segment
  • QX: the logarithmic likelihood of the segment to have BAD = X

The BABACHI tool is maintained by Sergey Abramov and Alexandr Boytsov.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babachi-2.0.26.tar.gz (506.6 kB view details)

Uploaded Source

Built Distribution

babachi-2.0.26-py3-none-any.whl (506.7 kB view details)

Uploaded Python 3

File details

Details for the file babachi-2.0.26.tar.gz.

File metadata

  • Download URL: babachi-2.0.26.tar.gz
  • Upload date:
  • Size: 506.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for babachi-2.0.26.tar.gz
Algorithm Hash digest
SHA256 5cc8f8a08020a54e5cd49188d7d5bc1dd1af2d2a2afebf5f8b0e23044154805d
MD5 bcfdbfdf9bc56e8a533235d3dd71c6dc
BLAKE2b-256 adc8342c46dbda5ee2892c742bc73b9d9ed59500ede138ec72391222c8199fe3

See more details on using hashes here.

File details

Details for the file babachi-2.0.26-py3-none-any.whl.

File metadata

  • Download URL: babachi-2.0.26-py3-none-any.whl
  • Upload date:
  • Size: 506.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for babachi-2.0.26-py3-none-any.whl
Algorithm Hash digest
SHA256 8eec2664fa676de6b4fa5489d9920a02ce33df71688e00e6ae587bae0348ede3
MD5 9ba1bae366a74afc83572524c11fc43b
BLAKE2b-256 69493a163d15ffeee618eff3dd9eaad075159a27f9d0972e243d9d69902272c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page