Skip to main content

Tools for converting VCF to IGD files and processing them.

Project description

igdtools

igdtools can convert from .vcf(.gz) to IGD, and once you have an IGD file it can perform various operations such as filtering, computing basic statistics, and generally transforming IGD files.

Run igdtools --help for more information on commands.

For more general reading and modification of IGD files, see pyigd.

Installation

igdtools is a C++ binary, with a small Python wrapper to make installation easier. You can install via:

pip install igdtools

which will install prebuilt binaries for most Linux systems, and install via a source distribution for other systems (such as MacOS). The source distribution requires CMake 3.10 or newer, the zlib development headers, and a version of clang or GCC that supports C++11.

Usage Examples

Convert .vcf(.gz) to IGD

Conversion will copy the variant identifiers ("ID" column in VCF) and individual identifiers (the sample column names in VCF) to the IGD file, unless --no-var-ids and --no-indiv-ids flags are specified (respectively).

igdtools input.vcf.gz -o output.igd

Convert .vcf(.gz) to IGD and export metadata

igdtools can export metadata fields as simple text files, each of which can be loaded by numpy.loadtxt(). You can use --export-metadata to export this metadata during conversion:

igdtools input.vcf.gz -o output.igd --export-metadata qual,filter,info

The list of metadata types you can export are qual,filter,chrom,info. You can also specify all to just export all of them without listing them out. By default, no metadata is exported during VCF to IGD conversion.

Just export .vcf(.gz) metadata

If you already have an IGD file, and want to go back and export the metadata from the corresponding VCF, you can just do the export:

igdtools input.vcf.gz --export-metadata qual,filter,info

Note that the naming will differ. When just exporting the metadata, the naming follows input.meta.*.txt, whereas the previous example with convering to IGD and exporting metadata would have named based on output.meta.*.txt.

IGD file header info

To examine the header information in the IGD file:

igdtools -i test.igd

which will print out something like

  Variants: 329556
  Individuals: 1000
  Ploidy: 2
  Phased?: true
  Source: true_data/simulation-source-1000-100mb.vcf
  Genome range: 115-99999629
  Has individual IDs? Yes
  Has variant IDs? No

Similarly, some simple statistics can be emitted by specifying -s, which causes the entire IGD file to be scanned (so will be slower than -i).

Copy variants within range

Create a copy of an IGD file, but only keep variants in a particular base-pair range. Here we show range [50000, 150000] (50KB to 150KB, inclusive):

igdtools input.igd -o output.igd -r 50000-150000

Copy variants with frequency

Create a copy of an IGD file, but only keep variants that have a particular allele frequency range. Here we show range [0.1, 0.4) (inclusive, exclusive).

igdtools input.igd -o output.igd -f 0.1-0.4

Copy to unphased data

Sometimes it is useful to perform "unphased" calculations. For example, when computing runs of homozygosity (ROH) it is easier to work with unphased diploid data that tracks the number of copies each individual has of an allele (0, 1, or 2). Create a copy of an IGD file, but store it unphased:

igdtools test.igd -o test.unphased.igd --force-unphased

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igdtools-2.5.tar.gz (206.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igdtools-2.5-cp313-cp313-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64

igdtools-2.5-cp312-cp312-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64

igdtools-2.5-cp311-cp311-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64

igdtools-2.5-cp310-cp310-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

igdtools-2.5-cp39-cp39-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64

igdtools-2.5-cp38-cp38-manylinux_2_24_x86_64.whl (208.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64

File details

Details for the file igdtools-2.5.tar.gz.

File metadata

  • Download URL: igdtools-2.5.tar.gz
  • Upload date:
  • Size: 206.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for igdtools-2.5.tar.gz
Algorithm Hash digest
SHA256 58ac1d6093089a397a7d0832d2177a880ddc8f6abf1ca8711d55c62467166b07
MD5 3ed5f53cdf4786e45327ac2f129e4854
BLAKE2b-256 e4fa29ceb54c06099ad029fb605dafbe56701c09fe64772eae4380be356cc267

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp313-cp313-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp313-cp313-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 982d497c58920f658ae54ab962929f8362a6d277be4541e5e1dbbab4f4cd90db
MD5 21e8e8965f7f484dd6c370090f77a6c6
BLAKE2b-256 30d19b6eaf135f53a4b9a3bb1687fcec45c91c4c4006351f1f1950b619cfde98

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp312-cp312-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp312-cp312-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 fa5fc07a685840da51945d6fa9aab6ddafe4474aa11dfa263e73fcd9664cd202
MD5 cd2314e0f585d231c028973f0cab61e7
BLAKE2b-256 d94e53590fe1519f73399d5f2008872b6e54d34756575cf347f2d0005ed46e2b

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp311-cp311-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp311-cp311-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 bd8555edcee6f5a125046d4f4545d1742cb1f60f51e6b48efb3d1350f83baa9d
MD5 c9a1f5b1e3511d4760e8cc48d8076f5f
BLAKE2b-256 b4d9e329a36f82baca57498f74abc45eb22ce505caf86e8fac90929cc96be9e3

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 e45f3ca715bba185792581114ed613216b5ef57af6ff3e903eb495ee86ecb2df
MD5 6e4750ee9f0f553aac0c380904d26b3c
BLAKE2b-256 3ee1ae92e34359e2d7ab4e05e1b5e0cd93ecc4d670395d8ac89524bde28de2ac

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp39-cp39-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp39-cp39-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 a30127eb412f5802d87668ae5e7bf89336dc96174a4ae8dfa12da32870214689
MD5 7c3c5e743f5d7792bdb7d9329e07ff60
BLAKE2b-256 adab694d7f77da5ada9f5f1d95703a5a74af15c35c5640f29d795f3ed6ed5443

See more details on using hashes here.

File details

Details for the file igdtools-2.5-cp38-cp38-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.5-cp38-cp38-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 4768a6c8f6176de8ec8690fbf0815ba5d00bbdcf0fc3702cacda430059abcea7
MD5 06ea0d7ff114aca0c2b4322cdfeb14d1
BLAKE2b-256 88a33c5074a2ec0a09f724d6ff7fcd882105f7373d3b28f1a9ee7833ea69cd64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page