Convert SNP array to VCF
Project description
Array As VCF
array-as-vcf
is a small library and tool to
convert common SNP array formats to VCF format.
There are four currently supported array formats:
- Affymetrix (TSV export)
- Cytoscan HD Array (TSV export)
- Lumi 317k array (TSV export)
- Lumi 370k array (TSV export)
- Multi-sample OpenArray (TSV export)
Binary formats are not (yet) supported.
Requirements
- Python 3.6
- requests
CLI usage
The array-as-vcf
tool will convert array files to VCF format.
It will auto-detect the type of array file, and throw an error if it can't
determine it.
The generated VCF file is printed to stdout.
A sample name to be used in the VCF file must be supplied.
The REF and ALT alleles will be queried from Ensembl if no lookup-table
is
supplied. This requires a working internet connection, and can be quite slow
due the amount of HTTP requests that are necessary.
When supplied with lookup-table
, no requests are made for the rsIDs
which exist within the lookup table. The lookup table is a JSON file,
containing a single large object of shape:
{
"rs0": "{ref_allele}:{alt_alleles}:{ref_is_minor_allele}"
}
E.g.
{
"rs1000003": "A:G:F"
}
If you have never run array-as-vcf
before , you can run array-as-vcf
sans lookup table
and dump
the generated internal lookup table to a file for next iterations.
Usage: array-as-vcf [OPTIONS]
Options:
-p, --path PATH Path to array file [required]
-b, --build [GRCh37|GRCh38]
-s, --sample-name TEXT Name of sample in VCF file
-c, --chr-prefix TEXT Optional prefix to chromosome names
-l, --lookup-table PATH Optional path to existing lookup table for
rsIDs.
-d, --dump PATH Optional path to write generated lookup table
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for array_as_vcf-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7979f84ad23a86567263d90ba3a6ec97ae3145cdac1fd4857d177016a2812b19 |
|
MD5 | a4de353629f114c6758d791f455ef4a1 |
|
BLAKE2b-256 | 3d91d0297acf04df4231b00178e5df889d60e7d999c2d861591e967447c2a177 |