Skip to main content

Tools for converting VCF to IGD files and processing them.

Project description

igdtools

igdtools can convert from .vcf(.gz) to IGD, and once you have an IGD file it can perform various operations such as filtering, computing basic statistics, and generally transforming IGD files.

Run igdtools --help for more information on commands.

For more general reading and modification of IGD files, see pyigd.

Installation

igdtools is a C++ binary, with a small Python wrapper to make installation easier. You can install via:

pip install igdtools

which will install prebuilt binaries for most Linux systems, and install via a source distribution for other systems (such as MacOS). The source distribution requires CMake 3.10 or newer, the zlib development headers, and a version of clang or GCC that supports C++11.

Usage Examples

Convert .vcf(.gz) to IGD

Conversion will copy the variant identifiers ("ID" column in VCF) and individual identifiers (the sample column names in VCF) to the IGD file, unless --no-var-ids and --no-indiv-ids flags are specified (respectively).

igdtools input.vcf.gz -o output.igd

Convert .vcf(.gz) to IGD and export metadata

igdtools can export metadata fields as simple text files, each of which can be loaded by numpy.loadtxt(). You can use --export-metadata to export this metadata during conversion:

igdtools input.vcf.gz -o output.igd --export-metadata qual,filter,info

The list of metadata types you can export are qual,filter,chrom,info. You can also specify all to just export all of them without listing them out. By default, no metadata is exported during VCF to IGD conversion.

Just export .vcf(.gz) metadata

If you already have an IGD file, and want to go back and export the metadata from the corresponding VCF, you can just do the export:

igdtools input.vcf.gz --export-metadata qual,filter,info

Note that the naming will differ. When just exporting the metadata, the naming follows input.meta.*.txt, whereas the previous example with convering to IGD and exporting metadata would have named based on output.meta.*.txt.

IGD file header info

To examine the header information in the IGD file:

igdtools -i test.igd

which will print out something like

  Variants: 329556
  Individuals: 1000
  Ploidy: 2
  Phased?: true
  Source: true_data/simulation-source-1000-100mb.vcf
  Genome range: 115-99999629
  Has individual IDs? Yes
  Has variant IDs? No

Similarly, some simple statistics can be emitted by specifying -s, which causes the entire IGD file to be scanned (so will be slower than -i).

Copy variants within range

Create a copy of an IGD file, but only keep variants in a particular base-pair range. Here we show range [50000, 150000] (50KB to 150KB, inclusive):

igdtools input.igd -o output.igd -r 50000-150000

Copy variants with frequency

Create a copy of an IGD file, but only keep variants that have a particular allele frequency range. Here we show range [0.1, 0.4) (inclusive, exclusive).

igdtools input.igd -o output.igd -f 0.1-0.4

Copy to unphased data

Sometimes it is useful to perform "unphased" calculations. For example, when computing runs of homozygosity (ROH) it is easier to work with unphased diploid data that tracks the number of copies each individual has of an allele (0, 1, or 2). Create a copy of an IGD file, but store it unphased:

igdtools test.igd -o test.unphased.igd --force-unphased

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igdtools-2.4.tar.gz (205.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igdtools-2.4-cp313-cp313-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64

igdtools-2.4-cp312-cp312-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64

igdtools-2.4-cp311-cp311-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64

igdtools-2.4-cp310-cp310-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

igdtools-2.4-cp39-cp39-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64

igdtools-2.4-cp38-cp38-manylinux_2_24_x86_64.whl (207.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64

File details

Details for the file igdtools-2.4.tar.gz.

File metadata

  • Download URL: igdtools-2.4.tar.gz
  • Upload date:
  • Size: 205.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for igdtools-2.4.tar.gz
Algorithm Hash digest
SHA256 1d336019688111746f11ce2918ac7a9ad659805fef0d3fa1aa690f27ffc2369b
MD5 8918888797b873bbef91467978e5d59d
BLAKE2b-256 360a200523c9b967d876a889bb3f2c85cc41eb407f13ad41f72b16290912ab04

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp313-cp313-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp313-cp313-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 3bbb19090979dfb65c06ec95a27e7a76e20663145be4fe389dbfe90a18aa5b2d
MD5 39e0782abf3250ef033d1b9bdd199074
BLAKE2b-256 cbb41c078e9d69a1c5de0758bc044cc038d66be09ff6b4b98031c2b2e5e3c81a

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp312-cp312-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp312-cp312-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 67b3834db7e2289c8bc21d0d56021664c1c0ba7c0c3407c62ec3f48a4b907990
MD5 127022e59bb6cee8c8acc5f5bd04bf03
BLAKE2b-256 a2dc1d6051045bfc38fdf7d09f049a680ba8c9aa0a6a05348cc6c6be440a193b

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp311-cp311-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp311-cp311-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 a23bb5bdeaa2551852347a5bc9132b3bbb78bcaaff64c682a2ebde3a596a3d55
MD5 926a345576fe8e68127ee969dda80692
BLAKE2b-256 520a7905c4b97b0d0da74ab90dac07da0c37f6f66c57ea7709ea3d0ba70d986a

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 bb0b4859577717e7ea0ee2a3ea887625fe0fbf79d9578f12909586858a88d001
MD5 f82d53839ca20454d821ffe82ecae6ca
BLAKE2b-256 af12a9df38fcfbcc0896143f41bd3b05509211b5ac3f6f52de23144986eee0ec

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp39-cp39-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp39-cp39-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 143239be6b711a85fa03524acda593999a667603e7efb210733c6f0d7505a86e
MD5 a075e96500deab339d208d26a46d8cae
BLAKE2b-256 1712a2b2dc414f47e8ec03751e8b07bb13074e30caa93414758782a68c4b1c23

See more details on using hashes here.

File details

Details for the file igdtools-2.4-cp38-cp38-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.4-cp38-cp38-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 15e95e5a7e16857c267136100db83898972e10b3540ab6bb50a3c5b887ca34f0
MD5 bebfc6e251713968aee86ca39d90f8cf
BLAKE2b-256 76ab1d00d550f7448727604715f53f75f0015305365541cf4cebf6676812768a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page