Skip to main content

A SNP processing package

Project description

SNProcess

by Thomas Dokas

dokastho@umich.edu

SNProcess is a Single Nucleotide Polymorphism (SNP) Quality Control pipeline, written in python the procedure was developed by Shing Wan Choi @ Mount Sinai, NYC and for more info check out this tutorial.

How to Install

SNProcess is very easy for any user to install. It's listed on pypi, which means you can install it using pip:

pip install snprocess

and you can upgrade it by running:

pip install snprocess --upgrade

Results

SNProcess will run a QC pipeline and produce the output files qcplink.xyz

Evaluation

SNProcess compiles relevant information with regard to the QC process into an HTML webpage. This can be found in your specified output folder, in the file index.html. SNProcess also provides a way of viewing this file easily, just run:

snprocess -i

and go to localhost:8008. This is a way of viewing this generated webpage using your own computer as a host. Pretty cool, right?

FAQ

  • What should my input json look like? Run snprocess -e for an example json printed to your console, or snprocess -g to generate your own to your output directory
  • SNProcess won't run for me! What's wrong? SNProcess is a somewhat unique program in that it uses plink as a driver/helper. Make sure that's installed. If the issue persists, please file a bug report at the snprocess repo

More About SNProcess

QC13. Remove individuals with outlying gender SNP's

Steps for QC:

  1. Check missingness and generate plots
  2. Remove individuals with high missingness
  3. Select autosomal SNPs only and filter out SNPs with low minor allele frequency (MAF)
  4. Delete SNPs not in the Hardy-WEinberg equilibrium (HWE)
  5. Heterozygosity and LD Pruning

QC2

This portion of the pipeline compares the user data with data in the 1,000 genome project and produces graphs that show the population stratification based on race & ethnicity

TODO

  • move final rscripts to python. shouldn't there be a python-like ggplot?
  • use tmp folder for files that arent used in index.html and qcplink.xyz
  • 1kg check if exists, otherwise download. use settings input flag to do qc on 1kg or not
  • remove snprocess.log
  • try to redirect warnings to log
  • save the plink file to output dur BEFORE LD pruning (that step is just for qc2) but keep this in for qc2
  • also save MDS_merged.mds

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snprocess-1.0.5.tar.gz (20.1 kB view details)

Uploaded Source

File details

Details for the file snprocess-1.0.5.tar.gz.

File metadata

  • Download URL: snprocess-1.0.5.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.10

File hashes

Hashes for snprocess-1.0.5.tar.gz
Algorithm Hash digest
SHA256 d9a3f13ccd10064158387abf8005e674b5a0d6c8095c785b9eb55b4c6c4d0603
MD5 751a4fcef3a28a0ae6e1b417097647a4
BLAKE2b-256 6dea42941b8e3ba80bce140a1322109a4e0f82dfc8c616d081414a7eeefd799a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page