Skip to main content

package to run a genotype quality control pipeline

Project description

IDEAL-GENOM

IDEAL-GENOM is a comprehensive Python package for automated, reproducible analysis of human genotype data. Currently it has implemented three pipelines: genomic quality control (QC) for case/control studies; processing VCF files after imputation; and genome-wide association studies (GWAS). It wraps years of research at CGE Tübingen, leveraging PLINK 1.9/2.0, GCTA and bcftools and also providing rich reporting and visualizations.

Key Features

  • Sample QC: Automated sample-level filtering.
  • Ancestry QC: Detection of outlier ancestry samples tailored for homogenous populations and based on 1KG data.
  • Variant QC: Automated variant-level (SNPs) filtering.
  • VCF Processing: Post-imputation VCF to PLINK filtering, harmonization and conversion to PLINK1.9 binaries.
  • GWAS: Generalized Linear Model (GLM) and Generalized Linear Mixed Model (GLMM) and top-hits finding.
  • Visualization: QC steps are complemented with high quality plots for reporting. Population structure visualization powered by dimensionality reduction algorithms such as Uniform Manifold Approximation and Projection (UMAP) and t-SNE. Moreover, it has a visualization functionalities to report GWAS' summary statistics.
  • Flexible configuration: Modular pipeline steps whose configuration is based on YAML files.
  • CLI, Jupyter, and Docker: Run as a command-line tool, in notebooks, or containerized
  • Reproducible: All steps, parameters, and outputs are logged.

Installation

You can install IDEAL-GENOM using pip:

pip install ideal-genom

Or clone the repository and install locally:

git clone https://github.com/cge-tuebingen/ideal-genom-qc.git
cd ideal-genom-qc
pip install .

For Docker usage:

docker build -t ideal-genom-qc .
docker run -it ideal-genom-qc

Installed Genomic Tools in Docker

The IDEAL-GENOM Docker image comes pre-installed with the following genomic analysis tools:

  • PLINK 1.9: Version 20231211
  • PLINK 2.0: Version 20240105 (AVX2 build)
  • GCTA: Version 1.95.0 (Linux x86_64)
  • BCFtools: Version 1.23

These tools are available in the container's PATH and can be used directly in your pipeline steps or custom scripts. Example usage inside the container:

plink --help
plink2 --help
gcta64 --help
bcftools --help

You can run these commands interactively by starting a shell in the container:

docker run -it ideal-genom-qc /bin/bash

This ensures reproducible and ready-to-use genomic analysis workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ideal_genom-1.0.0.tar.gz (114.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ideal_genom-1.0.0-py3-none-any.whl (131.3 kB view details)

Uploaded Python 3

File details

Details for the file ideal_genom-1.0.0.tar.gz.

File metadata

  • Download URL: ideal_genom-1.0.0.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for ideal_genom-1.0.0.tar.gz
Algorithm Hash digest
SHA256 67e8dd5e36a56f009cfbd2f6064c10988b7b5ec9d3759b39722e3983196729bf
MD5 975af6356203fba08099e517876698bf
BLAKE2b-256 1b39cf4e27a592f5ca124720694f1d755268c12e9182cea72a3e55dc850ec571

See more details on using hashes here.

File details

Details for the file ideal_genom-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ideal_genom-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 131.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for ideal_genom-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 453b9f602d51bc7a6609d9b32d0bea2d7da821d9c76841473b747dc02951417a
MD5 b10acf46524240875af2676fbcb6ae5b
BLAKE2b-256 8935b5450e21322a762b147a210e34cc60b79a8bacb8d7442f705f726639eb61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page