An interactive VCF filtering tool
Project description
SNPSnip - An interactive VCF Filtering Tool
SNPSnip is a command-line tool with an interactive web interface for filtering VCF files in multiple stages.
Prerequisites
- Python 3.8 or higher
- bcftools 1.18 or higher must be installed and available in your PATH (check with
bcftools --version, a statically compiled version is available here)
Install
pip install snpsnip
Or from the latest source:
pip install git+https://github.com/gekkonid/snpsnip.git
I recommend using pipx to install tools like this, to isolate tool dependencies.
Usage
Online mode
snpsnip --output-dir filtered_results --vcf input.vcf.gz \
--maf 0.05 --max-missing 0.1 --min-qual 30
This will conduct three phases of analysis, with a web UI to select threholds etc between these steps.
- Initial Processing: SNPSnip extracts a random subset of SNPs passing
basic filters, and calculates per-sample stats and a sample PCA. You will be
then presented with a web UI where you can set your sample filtering
thresholds to exclude poor samples and (optionally) create subsets of
samples. You can also specify absolute minimum thresholds on MAF,
missingness, and variant quality to subset the variants that are considered,
which is useful if you have a very large number of singleton or poor quality
variants (use
--maf,--max-missing,--min-qualfor this). - Variant Filtering: For each sample group, SNPSnip will then calculate variant-level statistics from the random subset of SNPs (from step 1). You can then set your thresholds per-group to exclude poor SNPs.
- Final Filtering: The tool applies your sample and VCF filters to the full VCF file to generate filtered outputs for each group of samples.
Optionally, some predefined groups (e.g. populations, species, etc) can be
provided with the --groups-file, --group-column and --sample-column
arguments. These predefined groups can then be refined based on the sample PCA
or simply used verbatim to define subsets of samples.
Offline mode
You can also run this in an "offline" mode, useful for example on clusters
# First, make a subset and calcuate PCA & Sample stats
snpsnip --vcf input.vcf.gz --offline
# This generates a static HTML file you can download & play with to set your
# thresholds. Then, you save a .json file to your PC and then copy that file
# back to wherever you're running SNPsnip, then:
snpsnip --vcf input.vcf.gz --offline --next snpsnip_sample_filters.json
# This makes the subsets, and calculates the SNP stats for each group of
# samples you selected. This again generates a static HTML file you can use to
# interactively make your SNP filtering threshold selections. Again save the
# output, copy it back to where you're running SNPsnip, then:
snpsnip --vcf input.vcf.gz --offline --next snpsnip_variant_filters.json
# This will generate the final files.
For more details, see the SNPSnip tutorial
License
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snpsnip-0.1.0.tar.gz.
File metadata
- Download URL: snpsnip-0.1.0.tar.gz
- Upload date:
- Size: 63.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1161f8d70423648595cc7a0b28c0f5021070df1de4cbb8484cfe9c917a2fa6bf
|
|
| MD5 |
35980789b0dc79afc11b669ee93a0d49
|
|
| BLAKE2b-256 |
d7183cb20e2c503bcec2904b07cc3ea85c823ec976902cd6b744ec2db0f7d21d
|
File details
Details for the file snpsnip-0.1.0-py3-none-any.whl.
File metadata
- Download URL: snpsnip-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3307314a33ba244bf53cf533f040e8c60f41f0a8bd78f8b5c76dbaac9eb92211
|
|
| MD5 |
90b344bad1002022134c3aa69b1e2d66
|
|
| BLAKE2b-256 |
c8a0aa1f715c76efa7a583091df03069a6beadb23456670095639f985d7fc5b2
|