Skip to main content

Tools for converting VCF to IGD files and processing them.

Project description

igdtools

igdtools can convert from .vcf(.gz) to IGD, and once you have an IGD file it can perform various operations such as filtering, computing basic statistics, and generally transforming IGD files.

Run igdtools --help for more information on commands.

For more general reading and modification of IGD files, see pyigd.

Installation

igdtools is a C++ binary, with a small Python wrapper to make installation easier. You can install via:

pip install igdtools

which will install prebuilt binaries for most Linux systems, and install via a source distribution for other systems (such as MacOS). The source distribution requires CMake 3.10 or newer, the zlib development headers, and a version of clang or GCC that supports C++11.

Usage Examples

Convert .vcf(.gz) to IGD

Conversion will copy the variant identifiers ("ID" column in VCF) and individual identifiers (the sample column names in VCF) to the IGD file, unless --no-var-ids and --no-indiv-ids flags are specified (respectively).

igdtools input.vcf.gz -o output.igd

Convert .vcf(.gz) to IGD and export metadata

igdtools can export metadata fields as simple text files, each of which can be loaded by numpy.loadtxt(). You can use --export-metadata to export this metadata during conversion:

igdtools input.vcf.gz -o output.igd --export-metadata qual,filter,info

The list of metadata types you can export are qual,filter,chrom,info. You can also specify all to just export all of them without listing them out. By default, no metadata is exported during VCF to IGD conversion.

Just export .vcf(.gz) metadata

If you already have an IGD file, and want to go back and export the metadata from the corresponding VCF, you can just do the export:

igdtools input.vcf.gz --export-metadata qual,filter,info

Note that the naming will differ. When just exporting the metadata, the naming follows input.meta.*.txt, whereas the previous example with convering to IGD and exporting metadata would have named based on output.meta.*.txt.

IGD file header info

To examine the header information in the IGD file:

igdtools -i test.igd

which will print out something like

  Variants: 329556
  Individuals: 1000
  Ploidy: 2
  Phased?: true
  Source: true_data/simulation-source-1000-100mb.vcf
  Genome range: 115-99999629
  Has individual IDs? Yes
  Has variant IDs? No

Similarly, some simple statistics can be emitted by specifying -s, which causes the entire IGD file to be scanned (so will be slower than -i).

Copy variants within range

Create a copy of an IGD file, but only keep variants in a particular base-pair range. Here we show range [50000, 150000] (50KB to 150KB, inclusive):

igdtools input.igd -o output.igd -r 50000-150000

Copy variants with frequency

Create a copy of an IGD file, but only keep variants that have a particular allele frequency range. Here we show range [0.1, 0.4) (inclusive, exclusive).

igdtools input.igd -o output.igd -f 0.1-0.4

Copy to unphased data

Sometimes it is useful to perform "unphased" calculations. For example, when computing runs of homozygosity (ROH) it is easier to work with unphased diploid data that tracks the number of copies each individual has of an allele (0, 1, or 2). Create a copy of an IGD file, but store it unphased:

igdtools test.igd -o test.unphased.igd --force-unphased

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igdtools-2.6.tar.gz (207.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igdtools-2.6-cp313-cp313-manylinux_2_24_x86_64.whl (210.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64

igdtools-2.6-cp312-cp312-manylinux_2_24_x86_64.whl (210.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64

igdtools-2.6-cp311-cp311-manylinux_2_24_x86_64.whl (210.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64

igdtools-2.6-cp310-cp310-manylinux_2_24_x86_64.whl (210.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

igdtools-2.6-cp39-cp39-manylinux_2_24_x86_64.whl (210.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64

igdtools-2.6-cp38-cp38-manylinux_2_24_x86_64.whl (209.7 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64

File details

Details for the file igdtools-2.6.tar.gz.

File metadata

  • Download URL: igdtools-2.6.tar.gz
  • Upload date:
  • Size: 207.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for igdtools-2.6.tar.gz
Algorithm Hash digest
SHA256 5518d4c93fe2c7956203274e01455a50eabeec4646c82f8b025c22351101e2e9
MD5 b523e671187ab766ed65d721f267df69
BLAKE2b-256 5ccbe9104c278e8fa4ced9c9e03abe8cb3b70ba0debcb4bbbfbcafb1b9a561c2

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp313-cp313-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp313-cp313-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 8a8188ddacfe044cf8735448cfec2793e43e99c33a6118af1b5478cdc1bd9d74
MD5 52beecc54f74317a3d5290273a634547
BLAKE2b-256 47e4d378bfcc74e0afeeb718ec055c0880410595f6e5fbd513aca6138008f5c5

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp312-cp312-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp312-cp312-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 6ffb2927b86b09d048dbcaaab1fedf1d4455499c1ba576de83299a66b77238b4
MD5 570a0d3e6d84c7720711b6491a33e700
BLAKE2b-256 c8e04de09d50bf4ffcf4790dc8dcc6c4cd0a7f45b51e58dcafc8897982dd01b3

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp311-cp311-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp311-cp311-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 86854fab8025506155fdf0ab33fc8ff9b3a46a249cc9f5c5fc8ce9ddc5943829
MD5 2f7a30620bf18373b188ce4505ef4d4b
BLAKE2b-256 ad5b58e929a818ee3010bbecf87ed22cc9ead6e72eaf2e20d8f7410654a0ff93

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 8a5c8f3d73fc610b6c97960e9d2b9dbcb67ae6d7ab591413360520bb319d8c7c
MD5 a76db56e24f9472221f513acf4621b87
BLAKE2b-256 699dbe6b85060b342ac26a22ad97f3e4e68a63bd7a124a47aa209f3288553d8e

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp39-cp39-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp39-cp39-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 9195687f68ae78f21fb0a4c743264a62f961309649903add875720824c39611c
MD5 234420fc81e3fb0c2e2ddc39af4fb88e
BLAKE2b-256 fc06cc75ca34da80bbd20a67b33990fc9f0f85cec87ed24e3a0f819508552d23

See more details on using hashes here.

File details

Details for the file igdtools-2.6-cp38-cp38-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.6-cp38-cp38-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 348f7e330bd36b78c3bf183451a5d4bbf166e8e4777e5a5d869d750c36107d0f
MD5 dc854599d1bec15af6040d3cdb87e73b
BLAKE2b-256 68cb1ff07dc901ba0f3940dbd4dc22dd6366e2be5f2c59d664fcb5a17ae04593

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page