The toolkits to analyze reference bias of short DNA read alignment.

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Project description

Updated: Mar 12, 2026

Biastools: Measuring, visualizing and diagnosing reference bias

This github is originally forked from https://github.com/sheila12345/biastools

Prerequisite programs

samtools=v1.11
bcftools=v1.9
bedtools=v2.30.0
gzip=v1.9
tabix=v1.9
bowtie2=v2.4.2
bwa=v0.7.17
mason_simulator=v2.0.9 (only for biastools --simulate)
SeqAn=v2.4.0 (only for biastools --simulate)

Installation

pip install biastools

Github

git clone https://github.com/maojanlin/biastools.git
cd biastools

Though optional, it is a good practice to install a virtual environment to manage the dependancies:

python -m venv venv
source venv/bin/activate

Now a virtual environment (named venv) is activated. Install biastools:

python setup.py install

Usage

Simulation, plotting, and analysis

$ biastools --simulate --align --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id>

With the example command, biastools

Simulates reads based on <ref.fa> and <vcf>, generating pair-end .fq.gz files for both haplotypes (work_dir/sample_name.hap{A,B}_{1,2}.fq.gz).
Aligns the reads to the reference <ref.fa>, generating a BAM file with phasing information (work_dir/sample_name.run_id.sorted.bam).
Analyzes the BAM file with the context-aware assignment method, generating bias reports and plots.

Other aligners

Biastools supports Bowtie 2 and bwa mem aligners. BAM files from other aligners (named with <work_dir/sample_name.run_id.sorted.bam> and tagged with haplotype information) can be analyzed with

$ biastools --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id>

Direct Analysis on Real sequence data

Biastools can also analyze real sequence data with the --real option using the context-aware assignment algorithm. The resulting plot does not include simulation information (sample_id.real.indel_balance.pdf).

$ biastools --analyze --real -t <thread> -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id> \
                      --bam <path_to_target.bam>

Biastools first fetches the relevant alignments from the target BAM file, focusing only on heterozygous variant sites specified in the VCF file. These sites are then analyzed using a context-aware algorithm. Finally, Biastools generates a bias report along with a bias-by-allele-length plot, both included in the output folder.

Combined Bias-by-allele-length plot

Multiple analysis results can be combined into a single Bias-by-allele-length plot. In biastools version 0.3.1, the default plotting module displays the 25th percentile, mean, and 75th percentile of the fraction of ALT alleles for variants stratified by allele length, using ticks to indicate the interquartile range and a central dot to mark the mean.

$ biastools --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id> \
                      -lr file1.bias.all file2.bias.all file3.bias.all... \
                      -ld run_id1 run_id2 run_id3...

The output file sample_name.combine.sim.indel_balance.pdf plots the fraction of ALT alleles merged from the bias reports specified after the -lr option. Users can use -ld option to specify the tool names, which will appear in the legend. To generate a combined plot using only real data bias reports (excluding simulation information), use the --real option.

An example of a combined bias-by-allele-length plot: multiple_indel_plot

In case you want to compare bias reports across two different coordinate systems. For example, alignments between HG002 and CHM13, or between chromosome 12 and chromosome 13, you can submit multiple VCF files using the plot merging feature.

$ biastools --analyze --real -o <work_dir> -g <ref.fa> -s <sample_name> -r <run_id> \
                      -lr chr1.bias.all chr2.bias.all chr3.bias.all \
                      -ld chr1 chr2 chr3 \
                      --vcf chr1.vcf.gz chr2.vcf.gz chr3.vcf.gz

Note that in this case, the lower panel will only show the number of variants from the first VCF file. So take the comparison with a grain of salt, since it is not exactly an apple-to-apple comparison. In our paper, we surjected the alignments of different tools to one single coordinate before comparison.

Bias prediction from bias report

Real data

Biastools can predict if a variant is bias or not by:

$ biastools --predict -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_pd_id> -pr <path_to_bias_report>

With the example command, biastools 4. Generates two files: sample_name.real.pd_id_bias.tsv and sample_name.real.pd_id_suspicious.tsv. The bias.tsv report contains all sites predicted to be biased by the model. The suspicious.tsv file contains the sites which suspicious of lacking enough information from the VCF file. In another word, the reads align to the site shows different pattern to the haplotype indicated by the VCF file.

Simulated guided prediction

$ biastools --predict -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_pd_id> \
                      -pr <path_to_bias_report> \
                      -ps <path_to_simulated_bias_report>

If the report of the sample based on simulated data is presented, biastools can generate cross prediction experiment result. In the experiment, the ground truth bias sites are based on simulation data.

Scanning bias without vcf information

Scanning

$ biastools_scan --scan -o <work_dir> -g <ref.fa> -s <sample_name> -r <run_id> -i <path_to_target.bam>

Biastools transforms the <path_to_target.bam> into the mpileup format and generates baised and suspicious regions (sample_name.run_id.bias.bed and sample_name.run_id.suspicious.bed).

Compare two bam files with common baseline

$ biastools_scan --compare_bam -o <work_dir> -g <ref.fa> -s <sample_name> -r <run_id> \
                               -i  <path_to_target.bam> \
                               -i2 <path_to_second.bam> \
                               -m  <path_to_target.mpileup> \
                               -m2 <path_to_second.mpileup>

Biastools generates a common baseline from path_to_target.bam and path_to_second.bam, and uses the new common baseline to recalculate the bias regions based on the two mpileup files. The mpileup files can be generated by running scanning first, or directly run the bcftools consensus.

Directly compare two bias reports

User can also generate the comparison of the bias reports without a common baseline (not recommended):

$ biastools_scan --compare_rpt -o <work_dir> -s <sample_name> -r <run_id> \
                               -b1 <path_to_target_bias.bed> \
                               -b2 <path_to_improved_bias.bed> \
                               -l2 <path_to_improved_lowRd.bed>

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.3.3

Mar 12, 2026

0.3.2

Mar 12, 2026

0.3.1

Apr 18, 2025

0.3.0

Apr 17, 2025

0.2.1

Mar 12, 2025

0.2.0

Feb 17, 2025

0.1.1

Oct 12, 2024

0.0.2

Sep 19, 2023

0.0.1

Aug 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biastools-0.3.3.tar.gz (57.1 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

biastools-0.3.3-py3-none-any.whl (71.9 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file biastools-0.3.3.tar.gz.

File metadata

Download URL: biastools-0.3.3.tar.gz
Upload date: Mar 12, 2026
Size: 57.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for biastools-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`7cb400a38d6e041b97ea7f9affb7c2b5973e39b231da891763b415c072f97714`
MD5	`258289f2752ef3400f937c2d5b79abd0`
BLAKE2b-256	`28c1d4483ec0cae50209a208ce824d6d4a360aab90569d8a511c7b24154356e8`

See more details on using hashes here.

File details

Details for the file biastools-0.3.3-py3-none-any.whl.

File metadata

Download URL: biastools-0.3.3-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 71.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for biastools-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ff8ef32888c9bc8ad6badc680de693b72ac4ecc0f2696100338000dfa7a1b6d`
MD5	`e32fc3d9ef549c6deb30c7624c04c298`
BLAKE2b-256	`856162fb7cd99f2ddf4161f6bf4fe92b339fc116ece4a244381b61ed9ed5d1d3`

See more details on using hashes here.

biastools 0.3.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Biastools: Measuring, visualizing and diagnosing reference bias

Prerequisite programs

Installation

Usage

Simulation, plotting, and analysis

Other aligners

Direct Analysis on Real sequence data

Combined Bias-by-allele-length plot

Bias prediction from bias report

Real data

Simulated guided prediction

Scanning bias without vcf information

Scanning

Compare two bam files with common baseline

Directly compare two bias reports

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes