Skip to main content

Convert SNP array to VCF

Project description

Array As VCF

array-as-vcf is a small library and tool to convert common SNP array formats to VCF format.

There are four currently supported array formats:

  • Affymetrix (TSV export)
  • Cytoscan HD Array (TSV export)
  • Lumi 317k array (TSV export)
  • Lumi 370k array (TSV export)
  • Multi-sample OpenArray (TSV export)

Binary formats are not (yet) supported.

Requirements

  • Python 3.6
  • requests

CLI usage

The array-as-vcf tool will convert array files to VCF format. It will auto-detect the type of array file, and throw an error if it can't determine it.

The generated VCF file is printed to stdout.

A sample name to be used in the VCF file must be supplied.

The REF and ALT alleles will be queried from Ensembl if no lookup-table is supplied. This requires a working internet connection, and can be quite slow due the amount of HTTP requests that are necessary.

When supplied with lookup-table, no requests are made for the rsIDs which exist within the lookup table. The lookup table is a JSON file, containing a single large object of shape:

{
  "rs0": "{ref_allele}:{alt_alleles}:{ref_is_minor_allele}"
}

E.g.

{
  "rs1000003": "A:G:F"
}

If you have never run array-as-vcf before , you can run array-as-vcf sans lookup table and dump the generated internal lookup table to a file for next iterations.

Usage: array-as-vcf [OPTIONS]

Options:
  -p, --path PATH              Path to array file  [required]
  -b, --build [GRCh37|GRCh38]
  -s, --sample-name TEXT       Name of sample in VCF file
  -c, --chr-prefix TEXT        Optional prefix to chromosome names
  -l, --lookup-table PATH      Optional path to existing lookup table for
                               rsIDs.
  -d, --dump PATH              Optional path to write generated lookup table
  --help                       Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

array-as-vcf-1.0.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

array_as_vcf-1.0.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file array-as-vcf-1.0.1.tar.gz.

File metadata

  • Download URL: array-as-vcf-1.0.1.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for array-as-vcf-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8e9dec42d7b8bec5d5c3305237166aecbc1f1ba78d00e3ce28614537bfdf07c6
MD5 a8c55bb86e6bfd97bfa7b4133bac037d
BLAKE2b-256 58ccfb7801df2c5c6a427f240db0153ae176fb805bd2eee10f1506f367be6f9f

See more details on using hashes here.

File details

Details for the file array_as_vcf-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: array_as_vcf-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for array_as_vcf-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 61548ee8d599a84fa9811c7592fbfeb56dd01660d8de7b72590a891ce3c426ba
MD5 0187402f5d68a656378561955aa56627
BLAKE2b-256 a0fd10c62cef477039806558a6a238e7241d15745aba6ff54ca6f40583e3fc3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page