Skip to main content

A collection of tools for genotype quality control and analysis

Project description

GenoTools

Getting Started

GenoTools is a suite of automated genotype data processing steps written in Python. The core pipeline was built for Quality Control and Ancestry estimation of data in the Global Parkinson's Genetics Program (GP2)

you can pull the most current references by running:

genotools-download

By default, the reference panel will be downloaded to ~/.genotools/ref. but can be download to a location of choice with --destination.

To download specific references/models, you can run the download with the following options:

genotools-download --ref 1kg_30x_hgdp_ashk_ref_panel --model nba_v1 --destination /path/to/download_directory/

Currently, 1kg_30x_hgdp_ashk_ref_panel is the only available reference panel. Available models are nba_v1 for the NeuroBooster array and neurochip_v1 for the NeuroChip Array. If using a different array, we would suggest training a new model by running the standard command below.

Modify the paths in the following command to run the standard GP2 pipeline:

genotools \
  --pfile /path/to/genotypes/for/qc \
  --out /path/to/qc/output \
  --ancestry \
  --ref_panel /path/to/reference/panel \
  --ref_labels /path/to/reference/ancestry/labels \
  --all_sample \
  --all_variant

if you'd like to run the pipeline using an existing model, you can do that like so (take note of the --model option):

genotools \
  --pfile /path/to/genotypes/for/qc \
  --out /path/to/qc/output \
  --ancestry \
  --ref_panel /path/to/reference/panel \
  --ref_labels /path/to/reference/ancestry/labels \
  --all_sample \
  --all_variant
  --model /path/to/nba_v1/model

if you'd like to run the pipeline using the default nba_v1 model in a Docker container, you can do that like so:

genotools \
  --pfile /path/to/genotypes/for/qc \
  --out /path/to/qc/output \
  --ancestry \
  --ref_panel /path/to/reference/panel \
  --ref_labels /path/to/reference/ancestry/labels \
  --container \
  --all_sample \
  --all_variant

Note: add the --singularity flag to run containerized ancestry predictions on HPC

This will find common snps between your genotype data and the reference panel, run PCA, UMAP-transform PCs, and train a new XGBoost classifier specific to your data/ref panel.

genotools accept --pfile, --bfile, or --vcf. Any bfile or vcf will be converted to a pfile before running any steps.

Documentation

Acknowledgements

GenoTools was developed as the core genotype and wgs processing pipeline for the Global Parkinson's Genetics Program (GP2) at the Center for Alzheimer's and Related Dementias (CARD) at the National Institutes of Health.

This tool relies on PLINK, a whole genome association analysis toolset, for various genetic data processing functionalities. We gratefully acknowledge the developers of PLINK for their foundational contributions to the field of genetics. More about PLINK can be found at their website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

the_real_genotools-1.0.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

the_real_genotools-1.0.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file the_real_genotools-1.0.0.tar.gz.

File metadata

  • Download URL: the_real_genotools-1.0.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for the_real_genotools-1.0.0.tar.gz
Algorithm Hash digest
SHA256 307c6355318eb1ea6d31e229b2c4da8c10527c24ad40638c4c156b3848a7a729
MD5 0d5b567374d4c6edf6b1a15842678b8a
BLAKE2b-256 7799801250af90f62be2206be98edc140af1736ac968f8dcadd507091d698e45

See more details on using hashes here.

Provenance

File details

Details for the file the_real_genotools-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for the_real_genotools-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 294a67d9b0d3e235b02d846900640c3071a2561bac0da308f7ed67c71807ab01
MD5 bfa1b75146c255603345f90465e1e666
BLAKE2b-256 ad3f1d6049ac131ca6f0b2202274917b1cceec8d41908b0d1874a17dc25f6064

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page