Skip to main content

Python tool to merge cross-species Illumina iScan genotype data with a reference set of genotypes from a pre-existing source.

Project description

iScanVCFMerge

iScanVCFMerge is a Python tool to facilitate the cross-species application of Illumina iScan system microarrays. The tool merges VCF genotypes exported from GenomeStudio with a second VCF, comprising genotypes derived from other samples and sources. Merging is based on matches of chromosome, position and certain conditions of major and minor alleles, with matched rows from each VCF concatenated into a single row (comprising all individuals) in the output files. The full algorithm is explained in the accompanying manuscript, where we reported use of the human Infinium Multi-Ethnic Global and Infinium Omni 2.5 arrays (hg19) to genotype great apes, and merged those with the genotypes of conspecifics previously published elsewhere.

What's new in version 1.1?

  • Bugs fixed to properly handle some multi-allelic sites.
  • The reference population VCF file must now be bgzipped and indexed with tabix. This requirement does not apply to the iScan VCF file, which can either be uncompressed or gzip compressed.
  • In the prior version, the complete reference population VCF file was read into memory before the relevant records were pulled. This caused issues for some users handling enormous reference VCF files. In this version, we use the Pysam library's lightweight wrapper of the htslib C-API to pull only the relevant records in the first place. The script should now run near-instantaneously, irrespective of input file size.
  • Console output is now handled by the Python logging module and is written to a .log file in the output directory.
  • Version numbering now follows 1.x vs 0.x format for improved compatibility with PyPI.

Installation

iScanVCFMerge 1.1 requires Python 3.9. It has been successfully tested on MacOS Big Sur 11.4 and on Ubuntu 21.04.

Option 1: Github clone and run with Python3

git clone "https://github.com/baneslab/iScanVCFMerge.git"
cd iScanVCFMerge
python3 iScanVCFMerge.py

If running the script directly with Python, you may also need to install the required packages, e.g.:

python3 -m pip install pandas pysam

Run Option 2: Install with pip

pip install iScanVCFMerge

or

python3 -m pip install iScanVCFMerge

Usage

iScanVCFMerge [-h] -I <iScan_vcf> -R <reference_vcf> -O <output_directory>

Optional arguments:

-h, --help                 Show the help message
-I, --iScanVCF             Path to your iScan VCF file (.vcf or .vcf.gz)
-R, --ReferenceVCF         Path to your reference VCF file, with which the iScan file will be merged. This must be bgzip compressed and be indexed with tabix
-O, --output_directory     Name of the output directory (will be created if it doesn't exist)

Citation

Please cite the use of this software as follows:

Fountain, E. D., Zhou, L-C., Karklus, A., Liu, Q-X., Meyers, J., Fontanilla, I. K., Rafael, E. F., Yu, J-Y., Zhang, Q., Zhu, X-L., Pei, E-L., Yuan, Y-H. and Banes, G. L. (2021). Cross-species application of Illumina iScan microarrays for cost-effective, high-throughput SNP discovery. Frontiers in Ecology and Evolution, 9:629252, doi: 10.3389/fevo.2021.629252.

The (Research Resource Identifier)[https://www.force11.org/group/resource-identification-initiative] for iScanVCFMerge is (RRID:SCR_021193)[https://scicrunch.org/resolver/RRID:SCR_021193].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iScanVCFMerge-1.1.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iScanVCFMerge-1.1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file iScanVCFMerge-1.1.tar.gz.

File metadata

  • Download URL: iScanVCFMerge-1.1.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for iScanVCFMerge-1.1.tar.gz
Algorithm Hash digest
SHA256 709ade8e57b125834c68a3ec95e1033179dba978a377b577a8cddf32a6c8013e
MD5 11228acb687eb22c9e5db7dfb8b21d1e
BLAKE2b-256 a9a1da58a8112d50683bf49f6a5adccb0a0519caf47a570ec1f0bf94abe3ef97

See more details on using hashes here.

File details

Details for the file iScanVCFMerge-1.1-py3-none-any.whl.

File metadata

  • Download URL: iScanVCFMerge-1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for iScanVCFMerge-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86273cd9c1e32403dda5d3eb0f9a5f411a93970aebffa0825879a6d9c5fc4373
MD5 0cc4ab327bb82d02c0df8c72c6032596
BLAKE2b-256 c1949bd3ec7bd2a4970139611c41e53f8f13fd841ed543581c59a0658e71b89c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page