Skip to main content

A handy little tool for filtering SNPs.

Project description

SNPfilter

A handy little tool for filtering SNPs

Installation

conda install -c bioconda samtools bwa
pip install snpfilter

Usage

prepare: Use BWA and SAMTools to map the sequencing data, generate BAM files and BCF files, etc.

usage: SNPfilter prepare [-h] [-t THREADS] [-q MIN_MQ] [-Q MIN_BQ] sample_id reference R1 R2

Prepare work environment

positional arguments:
  sample_id             sample id
  reference             reference genome in fasta format, should be indexed by bwa
  R1                    fastq file 1
  R2                    fastq file 2

optional arguments:
  -h, --help            show this help message and exit
  -t THREADS, --threads THREADS
                        num of threads
  -q MIN_MQ, --min-MQ MIN_MQ
                        skip alignments with mapQ smaller than INT
  -Q MIN_BQ, --min-BQ MIN_BQ
                        skip bases with baseQ/BAQ smaller than INT

qcfilter: filtering SNPs from BCF file with min depth and min variant frequency

usage: SNPfilter qcfilter [-h] [-d MIN_DEPTH] [-v MIN_VARIANT_FREQUENCY] [-b BACKGROUND_SITE] sample_id input_bcf

filtering snp from bcf file with min depth and min variant frequency

positional arguments:
  sample_id             sample id
  input_bcf             input bcf file

optional arguments:
  -h, --help            show this help message and exit
  -d MIN_DEPTH, --min_depth MIN_DEPTH
                        min depth
  -D MAX_DEPTH, --max_depth MAX_DEPTH
                        max depth                        
  -v MIN_VARIANT_FREQUENCY, --min_variant_frequency MIN_VARIANT_FREQUENCY
                        min variant frequency
  -b BACKGROUND_SITE, --background_site BACKGROUND_SITE
                        A comma-separated list of bcf files, loci that appear in these bcf files will be filtered out

codefilter: filtering SNPs based on whether they cause changes in coding amino acids

usage: SNPfilter codefilter [-h] sample_id input_bcf reference gff_file

filtering SNPs based on whether they cause changes in coding amino acids

positional arguments:
  sample_id   sample id
  input_bcf   input bcf file
  reference   reference genome in fasta format
  gff_file    gff file for reference genome

optional arguments:
  -h, --help  show this help message and exit

Example

  1. Indexing of the reference genome using BWA
bwa index reference.fasta
  1. Trim the sequencing data
zcat sample1_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample1_R1.trimmed.fastq.gz
zcat sample1_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample1_R2.trimmed.fastq.gz
zcat sample2_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample2_R1.trimmed.fastq.gz
zcat sample2_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample2_R2.trimmed.fastq.gz
zcat sample3_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample3_R1.trimmed.fastq.gz
zcat sample3_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample3_R2.trimmed.fastq.gz
  1. Use prepare to generate bam files and BCF files
SNPfilter prepare -t 10 sample1 reference.fasta sample1_R1.trimmed.fastq.gz sample1_R2.trimmed.fastq.gz
SNPfilter prepare -t 10 sample2 reference.fasta sample2_R1.trimmed.fastq.gz sample2_R2.trimmed.fastq.gz
SNPfilter prepare -t 10 sample3 reference.fasta sample3_R1.trimmed.fastq.gz sample3_R2.trimmed.fastq.gz
  1. Use qcfilter to filter SNPs with a relaxed threshold(d=2, D=40, v=0.3)
SNPfilter qcfilter -d 2 -D 40 -v 0.3 sample1 sample1.bcf
SNPfilter qcfilter -d 2 -D 40 -v 0.3 sample2 sample2.bcf
SNPfilter qcfilter -d 2 -D 40 -v 0.3 sample3 sample3.bcf
  1. Use qcfilter to filter SNPs with a strict threshold(d=5, D=40, v=0.9, other sample BCF file as background)
SNPfilter qcfilter -d 5 -D 40 -v 0.9 -b sample2.d2.v0.30.bcf,sample3.d2.D40.v0.30.bcf sample1 sample1.d2.D40.v0.30.bcf
SNPfilter qcfilter -d 5 -D 40 -v 0.9 -b sample1.d2.v0.30.bcf,sample3.d2.D40.v0.30.bcf sample2 sample2.d2.D40.v0.30.bcf
SNPfilter qcfilter -d 5 -D 40 -v 0.9 -b sample1.d2.v0.30.bcf,sample2.d2.D40.v0.30.bcf sample3 sample3.d2.D40.v0.30.bcf
  1. Use codefilter to filter SNPs that do not cause changes in coding amino acids
SNPfilter codefilter sample1 sample1.d5.v0.90.bcf reference.fasta reference.gff
SNPfilter codefilter sample2 sample2.d5.v0.90.bcf reference.fasta reference.gff
SNPfilter codefilter sample3 sample3.d5.v0.90.bcf reference.fasta reference.gff

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SNPfilter-0.0.5.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SNPfilter-0.0.5-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file SNPfilter-0.0.5.tar.gz.

File metadata

  • Download URL: SNPfilter-0.0.5.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for SNPfilter-0.0.5.tar.gz
Algorithm Hash digest
SHA256 3a2af21c94dc51cf7c4f24a081c11f28a26643d9242626ef3c5149452dbc895c
MD5 65d440e61869cd82bfbe0b8d83d59467
BLAKE2b-256 b1ee054ab42dd7fc5423dd59f9efd1141dd676d360daffb7da56948d8f33fb0c

See more details on using hashes here.

File details

Details for the file SNPfilter-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: SNPfilter-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for SNPfilter-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 86f8d5718f4cbd03572af643d3b4eaf91a06a11451e33151a21bf0476e7fdfa9
MD5 c1c249453b59d2ad2f2d23845c56f5b3
BLAKE2b-256 59d1cd58f64afc9105d7f195504e0f03de2f91545aba38f0a75c1ad133a5bb81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page