Skip to main content

An interactive VCF filtering tool

Project description

SNPSnip - An interactive VCF Filtering Tool

SNPSnip is a command-line tool with an interactive web interface for filtering VCF files in multiple stages.

Prerequisites

  • Python 3.8 or higher
  • bcftools 1.18 or higher must be installed and available in your PATH (check with bcftools --version, a statically compiled version is available here)

Install

pip install snpsnip

Or from the latest source:

pip install git+https://github.com/gekkonid/snpsnip.git

I recommend using pipx to install tools like this, to isolate tool dependencies.

Usage

Online mode

snpsnip --output-dir filtered_results --vcf input.vcf.gz \
    --maf 0.05 --max-missing 0.1 --min-qual 30

This will conduct three phases of analysis, with a web UI to select threholds etc between these steps.

  1. Initial Processing: SNPSnip extracts a random subset of SNPs passing basic filters, and calculates per-sample stats and a sample PCA. You will be then presented with a web UI where you can set your sample filtering thresholds to exclude poor samples and (optionally) create subsets of samples. You can also specify absolute minimum thresholds on MAF, missingness, and variant quality to subset the variants that are considered, which is useful if you have a very large number of singleton or poor quality variants (use --maf, --max-missing, --min-qual for this).
  2. Variant Filtering: For each sample group, SNPSnip will then calculate variant-level statistics from the random subset of SNPs (from step 1). You can then set your thresholds per-group to exclude poor SNPs.
  3. Final Filtering: The tool applies your sample and VCF filters to the full VCF file to generate filtered outputs for each group of samples.

Optionally, some predefined groups (e.g. populations, species, etc) can be provided with the --groups-file, --group-column and --sample-column arguments. These predefined groups can then be refined based on the sample PCA or simply used verbatim to define subsets of samples.

Offline mode

You can also run this in an "offline" mode, useful for example on clusters

# First, make a subset and calcuate PCA & Sample stats
snpsnip --vcf input.vcf.gz --offline

# This generates a static HTML file you can download & play with to set your
# thresholds. Then, you save a .json file to your PC and then copy that file
# back to wherever you're running SNPsnip, then:

snpsnip --vcf input.vcf.gz --offline --next snpsnip_sample_filters.json

# This makes the subsets, and calculates the SNP stats for each group of
# samples you selected. This again generates a static HTML file you can use to
# interactively make your SNP filtering threshold selections. Again save the
# output, copy it back to where you're running SNPsnip, then:

snpsnip --vcf input.vcf.gz --offline --next snpsnip_variant_filters.json

# This will generate the final files.

For more details, see the SNPSnip tutorial

License

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snpsnip-0.1.0.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snpsnip-0.1.0-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file snpsnip-0.1.0.tar.gz.

File metadata

  • Download URL: snpsnip-0.1.0.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for snpsnip-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1161f8d70423648595cc7a0b28c0f5021070df1de4cbb8484cfe9c917a2fa6bf
MD5 35980789b0dc79afc11b669ee93a0d49
BLAKE2b-256 d7183cb20e2c503bcec2904b07cc3ea85c823ec976902cd6b744ec2db0f7d21d

See more details on using hashes here.

File details

Details for the file snpsnip-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: snpsnip-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for snpsnip-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3307314a33ba244bf53cf533f040e8c60f41f0a8bd78f8b5c76dbaac9eb92211
MD5 90b344bad1002022134c3aa69b1e2d66
BLAKE2b-256 c8a0aa1f715c76efa7a583091df03069a6beadb23456670095639f985d7fc5b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page