Skip to main content

Create all possible combinations of phased and unphased blocks in a vcf

Project description

Continous integration

Haploblock-shuffler

Create all possible combinations of phased and unphased blocks in a vcf


Background

This tool takes a phase, unphased or partially phased VCF file, and generates all possible combinations of phase blocks that are consistent with the phasing that is present in the VCF file.

Details

First, this tool reads all variants from a VCF file, and groups variants together if they are compatible.

  1. If a variant is phased (using the PS tag), it is only compatible with other phased variants that have the same phase ID.
  2. Homozygous variants are always compatible with other variants, since they are part of every phase group
  3. Heterozygous variants are only compatible when they are phased, and the phase ID matches.

To produce all possible combinations of grouped variants, haplotype-suffler uses a counter to produce a binary pattern that determines which calls should be modified. To modify a variant, we simply invert the order of the GT field, so that 0/1 becomes 1/0, or vice versa.

Since there are two alleles for every variant, we only have to produce half of the possible VCF file, since the other half are mirror images (e.g. 0101 and 1010).

Usage

haploblock-shuffler test.vcf output

To generate consensus fasta files from the output vcf files, bgzip and index the output vcf files

cd output
for i in out_*.vcf; do
    bgzip $i
    tabix ${i}.gz
done

Then, generate the consensus using

samtools faidx $REFERENCE $REGION | bcftools consensus -H 1 out_0.vcf.gz > out_0_1.fa
samtools faidx $REFERENCE $REGION | bcftools consensus -H 2 out_0.vcf.gz > out_0_2.fa

Limitations

This tool will generate 2^(n-1) VCF files in the specified output folder, where n is the number of phase blocks in the input VCF (see above). By default, this is limited to 11 blocks, which means that at most 1024 files will be created. This limit can be increased by using --max-blocks, but use with caution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haploblock-shuffler-0.0.6.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

haploblock_shuffler-0.0.6-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file haploblock-shuffler-0.0.6.tar.gz.

File metadata

  • Download URL: haploblock-shuffler-0.0.6.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for haploblock-shuffler-0.0.6.tar.gz
Algorithm Hash digest
SHA256 39a11aeb522ebbe84f030b82874783f6cd1b355688b2398fc8891431f669a3d1
MD5 79d4ea372511701d3b3ed0d72cb0a9ad
BLAKE2b-256 668013bacf469cc94a8496def0a5effee97c8273b5d974af8548a21d77955040

See more details on using hashes here.

File details

Details for the file haploblock_shuffler-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: haploblock_shuffler-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for haploblock_shuffler-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a2bcd48542d093e24a34c05faf6f5709229bca37aa73677a5257698169ae313d
MD5 5dbc8447731540745a9e2f747e9a999d
BLAKE2b-256 95991c9dff7b6c9394e7dada1341ffd8080dafcfb4897a62e42c574e89e6029b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page