Skip to main content

Tools for converting VCF to IGD files and processing them.

Project description

igdtools

igdtools can convert from .vcf(.gz) to IGD, and once you have an IGD file it can perform various operations such as filtering, computing basic statistics, and generally transforming IGD files.

Run igdtools --help for more information on commands.

For more general reading and modification of IGD files, see pyigd.

Installation

igdtools is a C++ binary, with a small Python wrapper to make installation easier. You can install via:

pip install igdtools

which will install prebuilt binaries for most Linux systems, and install via a source distribution for other systems (such as MacOS). The source distribution requires CMake 3.10 or newer, the zlib development headers, and a version of clang or GCC that supports C++11.

Usage Examples

Convert .vcf(.gz) to IGD

Conversion will copy the variant identifiers ("ID" column in VCF) and individual identifiers (the sample column names in VCF) to the IGD file, unless --no-var-ids and --no-indiv-ids flags are specified (respectively).

igdtools input.vcf.gz -o output.igd

Convert .vcf(.gz) to IGD and export metadata

igdtools can export metadata fields as simple text files, each of which can be loaded by numpy.loadtxt(). You can use --export-metadata to export this metadata during conversion:

igdtools input.vcf.gz -o output.igd --export-metadata qual,filter,info

The list of metadata types you can export are qual,filter,chrom,info. You can also specify all to just export all of them without listing them out. By default, no metadata is exported during VCF to IGD conversion.

Just export .vcf(.gz) metadata

If you already have an IGD file, and want to go back and export the metadata from the corresponding VCF, you can just do the export:

igdtools input.vcf.gz --export-metadata qual,filter,info

Note that the naming will differ. When just exporting the metadata, the naming follows input.meta.*.txt, whereas the previous example with convering to IGD and exporting metadata would have named based on output.meta.*.txt.

IGD file header info

To examine the header information in the IGD file:

igdtools -i test.igd

which will print out something like

  Variants: 329556
  Individuals: 1000
  Ploidy: 2
  Phased?: true
  Source: true_data/simulation-source-1000-100mb.vcf
  Genome range: 115-99999629
  Has individual IDs? Yes
  Has variant IDs? No

Similarly, some simple statistics can be emitted by specifying -s, which causes the entire IGD file to be scanned (so will be slower than -i).

Copy variants within range

Create a copy of an IGD file, but only keep variants in a particular base-pair range. Here we show range [50000, 150000] (50KB to 150KB, inclusive):

igdtools input.igd -o output.igd -r 50000-150000

Copy variants with frequency

Create a copy of an IGD file, but only keep variants that have a particular allele frequency range. Here we show range [0.1, 0.4) (inclusive, exclusive).

igdtools input.igd -o output.igd -f 0.1-0.4

Copy to unphased data

Sometimes it is useful to perform "unphased" calculations. For example, when computing runs of homozygosity (ROH) it is easier to work with unphased diploid data that tracks the number of copies each individual has of an allele (0, 1, or 2). Create a copy of an IGD file, but store it unphased:

igdtools test.igd -o test.unphased.igd --force-unphased

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igdtools-2.3.tar.gz (195.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igdtools-2.3-cp313-cp313-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64

igdtools-2.3-cp312-cp312-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64

igdtools-2.3-cp311-cp311-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64

igdtools-2.3-cp310-cp310-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

igdtools-2.3-cp39-cp39-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64

igdtools-2.3-cp38-cp38-manylinux_2_24_x86_64.whl (171.7 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64

File details

Details for the file igdtools-2.3.tar.gz.

File metadata

  • Download URL: igdtools-2.3.tar.gz
  • Upload date:
  • Size: 195.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for igdtools-2.3.tar.gz
Algorithm Hash digest
SHA256 f32e8f35b991e4a8c5545ffabde2195bc8de645edc327fb527651ae17402185e
MD5 c737d5a86d6cf542a2b5dbaffb67292f
BLAKE2b-256 eace390c63bbd0ada63b7d66a1a65d944977844be131b3de897364a27c83c515

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp313-cp313-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp313-cp313-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 9994fe84813720d034587ddfc3394124f666863ee0fd1922e872cec4d68cc42b
MD5 be0961c2db61b0f8a93ff0ac856ab047
BLAKE2b-256 2e2abde453f43c5139dd948d417785c4b446c22be84668d7100b3cf14229b2bf

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp312-cp312-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp312-cp312-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 4ba3d86a526e5eaad91c3de9c73ec120bd859753d9b3d544934214d5a9691d3a
MD5 c63d231e6cf7377914ec7fdf8aa31111
BLAKE2b-256 a408de331508441215f40eaeecb40e78730ee235c3b78ccd7e80049356a3b77c

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp311-cp311-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp311-cp311-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 4fd118d793949ec2c0e6371fe64bf6b06659edb89acb207c27f83bf0b5a935e6
MD5 483b979267e57afde54d05524bcee53b
BLAKE2b-256 fafca481d7d93bb9b14aad825743c94375871c0705882f05c51fdef2bc3c8d26

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 684126e1f901b617be44961fe565e23bc30e10046621aa61d6706a8f376fbae7
MD5 c66cd25ad47621cda7fe67835e1e9643
BLAKE2b-256 5e160901538c4566f16731dc314fc9a8c6499017df0c293d4c707320c4c26b41

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp39-cp39-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp39-cp39-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 59ef4fe7746525df2a567e62e766ee9b12c7833144d27b380a8d9a3631907389
MD5 fd3c22ea5d5a34840c01db0113265c14
BLAKE2b-256 da24a83571e679fd448f1f564926d983dc3d5a8b5199df5871b6a8c6e219bd13

See more details on using hashes here.

File details

Details for the file igdtools-2.3-cp38-cp38-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.3-cp38-cp38-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 50397ccac224b9f1798584f37625b099c9db1658d15324f002984ff45c226415
MD5 4e6d2cc7d6de512a027274f17b25f7e8
BLAKE2b-256 91aa98981852b0b63649ced11052c6c90a4154dbfb838221e02fadf9ac93ee3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page