A collection of tools for genotype quality control and analysis
Project description
GenoTools
Getting Started
GenoTools is a suite of automated genotype data processing steps written in Python. The core pipeline was built for Quality Control and Ancestry estimation of data in the Global Parkinson's Genetics Program (GP2)
To download the most current version from pip:
pip install the_real_genotools
Alternatively, if you'd like to download from github:
git clone https://github.com/dvitale199/GenoTools.git
cd GenoTools
pip install .
you can pull the most current references by running:
genotools-download
By default, the reference panel will be downloaded to ~/.genotools/ref. but can be download to a location of choice with --destination
.
To download specific references/models, you can run the download with the following options:
genotools-download --ref 1kg_30x_hgdp_ashk_ref_panel --model nba_v1 --destination /path/to/download_directory/
Currently, 1kg_30x_hgdp_ashk_ref_panel
is the only available reference panel. Available models are nba_v1
for the NeuroBooster array and neurochip_v1
for the NeuroChip Array. If using a different array, we would suggest training a new model by running the standard command below.
Modify the paths in the following command to run the standard GP2 pipeline:
genotools \
--pfile /path/to/genotypes/for/qc \
--out /path/to/qc/output \
--ancestry \
--ref_panel /path/to/reference/panel \
--ref_labels /path/to/reference/ancestry/labels \
--all_sample \
--all_variant
if you'd like to run the pipeline using an existing model, you can do that like so (take note of the --model
option):
genotools \
--pfile /path/to/genotypes/for/qc \
--out /path/to/qc/output \
--ancestry \
--ref_panel /path/to/reference/panel \
--ref_labels /path/to/reference/ancestry/labels \
--all_sample \
--all_variant
--model /path/to/nba_v1/model
if you'd like to run the pipeline using the default nba_v1 model in a Docker container, you can do that like so:
genotools \
--pfile /path/to/genotypes/for/qc \
--out /path/to/qc/output \
--ancestry \
--ref_panel /path/to/reference/panel \
--ref_labels /path/to/reference/ancestry/labels \
--container \
--all_sample \
--all_variant
Note: add the --singularity
flag to run containerized ancestry predictions on HPC
This will find common snps between your genotype data and the reference panel, run PCA, UMAP-transform PCs, and train a new XGBoost classifier specific to your data/ref panel.
genotools accept --pfile
, --bfile
, or --vcf
. Any bfile or vcf will be converted to a pfile before running any steps.
Documentation
Acknowledgements
GenoTools was developed as the core genotype and wgs processing pipeline for the Global Parkinson's Genetics Program (GP2) at the Center for Alzheimer's and Related Dementias (CARD) at the National Institutes of Health.
This tool relies on PLINK, a whole genome association analysis toolset, for various genetic data processing functionalities. We gratefully acknowledge the developers of PLINK for their foundational contributions to the field of genetics. More about PLINK can be found at their website.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file the_real_genotools-1.0.1.tar.gz
.
File metadata
- Download URL: the_real_genotools-1.0.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 425dcfb319733b18063092f4203cd0ac0650883230d005357bde0f8ba416b63b |
|
MD5 | 7eb4b10d8c497ead162356e5aa1c00aa |
|
BLAKE2b-256 | b01cbd666e3464ac9fb7a3311facae656be7d68cedf94a319af36ea800dfd67a |
Provenance
File details
Details for the file the_real_genotools-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: the_real_genotools-1.0.1-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f1b348f6351f0a7f4698e7c8906896596778d1508f6af7b299d017ffcf9df38 |
|
MD5 | 2a08af4685752102fd34acc5ead520e8 |
|
BLAKE2b-256 | a7facd0bde77cc0ca9d53c0ab862852e384436d4376e93a3ba7b7a7cf9728bff |