Skip to main content

Tools for converting VCF to IGD files and processing them.

Project description

igdtools

igdtools can convert from .vcf(.gz) to IGD, and once you have an IGD file it can perform various operations such as filtering, computing basic statistics, and generally transforming IGD files.

Run igdtools --help for more information on commands.

For more general reading and modification of IGD files, see pyigd.

Installation

igdtools is a C++ binary, with a small Python wrapper to make installation easier. You can install via:

pip install igdtools

which will install prebuilt binaries for most Linux systems, and install via a source distribution for other systems (such as MacOS). The source distribution requires CMake 3.10 or newer, the zlib development headers, and a version of clang or GCC that supports C++11.

Usage Examples

Convert .vcf(.gz) to IGD

Conversion will copy the variant identifiers ("ID" column in VCF) and individual identifiers (the sample column names in VCF) to the IGD file, unless --no-var-ids and --no-indiv-ids flags are specified (respectively).

igdtools input.vcf.gz -o output.igd

Convert .vcf(.gz) to IGD and export metadata

igdtools can export metadata fields as simple text files, each of which can be loaded by numpy.loadtxt(). You can use --export-metadata to export this metadata during conversion:

igdtools input.vcf.gz -o output.igd --export-metadata qual,filter,info

The list of metadata types you can export are qual,filter,chrom,info. You can also specify all to just export all of them without listing them out. By default, no metadata is exported during VCF to IGD conversion.

Just export .vcf(.gz) metadata

If you already have an IGD file, and want to go back and export the metadata from the corresponding VCF, you can just do the export:

igdtools input.vcf.gz --export-metadata qual,filter,info

Note that the naming will differ. When just exporting the metadata, the naming follows input.meta.*.txt, whereas the previous example with convering to IGD and exporting metadata would have named based on output.meta.*.txt.

IGD file header info

To examine the header information in the IGD file:

igdtools -i test.igd

which will print out something like

  Variants: 329556
  Individuals: 1000
  Ploidy: 2
  Phased?: true
  Source: true_data/simulation-source-1000-100mb.vcf
  Genome range: 115-99999629
  Has individual IDs? Yes
  Has variant IDs? No

Similarly, some simple statistics can be emitted by specifying -s, which causes the entire IGD file to be scanned (so will be slower than -i).

Copy variants within range

Create a copy of an IGD file, but only keep variants in a particular base-pair range. Here we show range [50000, 150000] (50KB to 150KB, inclusive):

igdtools input.igd -o output.igd -r 50000-150000

Copy variants with frequency

Create a copy of an IGD file, but only keep variants that have a particular allele frequency range. Here we show range [0.1, 0.4) (inclusive, exclusive).

igdtools input.igd -o output.igd -f 0.1-0.4

Copy to unphased data

Sometimes it is useful to perform "unphased" calculations. For example, when computing runs of homozygosity (ROH) it is easier to work with unphased diploid data that tracks the number of copies each individual has of an allele (0, 1, or 2). Create a copy of an IGD file, but store it unphased:

igdtools test.igd -o test.unphased.igd --force-unphased

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igdtools-2.2.tar.gz (194.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igdtools-2.2-cp313-cp313-manylinux_2_24_x86_64.whl (167.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64

igdtools-2.2-cp312-cp312-manylinux_2_24_x86_64.whl (167.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64

igdtools-2.2-cp311-cp311-manylinux_2_24_x86_64.whl (167.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64

igdtools-2.2-cp310-cp310-manylinux_2_24_x86_64.whl (167.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

igdtools-2.2-cp39-cp39-manylinux_2_24_x86_64.whl (167.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64

igdtools-2.2-cp38-cp38-manylinux_2_24_x86_64.whl (167.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.24+ x86-64

File details

Details for the file igdtools-2.2.tar.gz.

File metadata

  • Download URL: igdtools-2.2.tar.gz
  • Upload date:
  • Size: 194.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for igdtools-2.2.tar.gz
Algorithm Hash digest
SHA256 785190987b172a32553a9249a68fb838cbfe96a0887d0bbffba041accf0f7fff
MD5 76868ecec46af538fc81061fe71bc0e7
BLAKE2b-256 a59fabceefeda531f61f13c3caddc1e6d73b62df4b89136b8186d8c1abb1536b

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp313-cp313-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp313-cp313-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 ee0266a8934352b2e422bd75f887b66f03c4a2c949935680b6ea7b22d9143e91
MD5 68b33364a6d6967b9729f4dbd60f148a
BLAKE2b-256 fe4d9891f63658465aacc997e17968e6c0efab2fa2cd120694941c44f262e704

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp312-cp312-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp312-cp312-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 6335c72c24285f439f3d67004b512c2de0205bb27dc73080a0df36add172a46a
MD5 2da73f1271b32da9642a9169ae2ea5f7
BLAKE2b-256 c35642df693b37835e0c4ab60831f17e1de97d9e976fc011c47de4e7cd6e8caf

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp311-cp311-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp311-cp311-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 72147f04364cbe21ae06e6445a5cebee1e6df56c362ba41e91016768d420ab44
MD5 a996fe6efefdaecd1a6b52f3f0ccc610
BLAKE2b-256 3de8c1f5d3c128f7b4d5356273fbedf5134df4c19c43dfea443885f51203ce01

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 48352bbea6e7a1290c4ac5e3ec823f88314678bbf565c4299ccd5ae9b86e563b
MD5 27a1863ead096261fe59767c8d9ca56e
BLAKE2b-256 11123397f0075ffb1d0977b5a6405a1e51b15f32a292d64b8dcc70c3f2dad482

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp39-cp39-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp39-cp39-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 6df89b39abe6c7dfc3fd28c0bc92300c72d6b4c38ffb1289616b8e2941192571
MD5 351a5110f3b5f667c48f6d8451f75120
BLAKE2b-256 20684cea315b4aa5552839652309db455b56783d965ecff7b6e9097539fb9dfa

See more details on using hashes here.

File details

Details for the file igdtools-2.2-cp38-cp38-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for igdtools-2.2-cp38-cp38-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 557393f771500d3334a71bbaeb42547ddc5777c725ad7e3d7dfdb41a1523ca97
MD5 9691a9c821b373433e91a0ce167e94e2
BLAKE2b-256 a71b63efa434edddaa30ad7a517533b00d9f678ad7acfa68d8e30f088fbb47bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page