A handy little tool for filtering SNPs.
Project description
SNPfilter
A handy little tool for filtering SNPs
Installation
conda install -c bioconda samtools bwa
pip install snpfilter
Usage
prepare: Use BWA and SAMTools to map the sequencing data, generate BAM files and BCF files, etc.
usage: SNPfilter prepare [-h] [-t THREADS] [-q MIN_MQ] [-Q MIN_BQ] sample_id reference R1 R2
Prepare work environment
positional arguments:
sample_id sample id
reference reference genome in fasta format, should be indexed by bwa
R1 fastq file 1
R2 fastq file 2
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
num of threads
-q MIN_MQ, --min-MQ MIN_MQ
skip alignments with mapQ smaller than INT
-Q MIN_BQ, --min-BQ MIN_BQ
skip bases with baseQ/BAQ smaller than INT
qcfilter: filtering SNPs from BCF file with min depth and min variant frequency
usage: SNPfilter qcfilter [-h] [-d MIN_DEPTH] [-v MIN_VARIANT_FREQUENCY] [-b BACKGROUND_SITE] sample_id input_bcf
filtering snp from bcf file with min depth and min variant frequency
positional arguments:
sample_id sample id
input_bcf input bcf file
optional arguments:
-h, --help show this help message and exit
-d MIN_DEPTH, --min_depth MIN_DEPTH
min depth
-v MIN_VARIANT_FREQUENCY, --min_variant_frequency MIN_VARIANT_FREQUENCY
min variant frequency
-b BACKGROUND_SITE, --background_site BACKGROUND_SITE
A comma-separated list of bcf files, loci that appear in these bcf files will be filtered out
codefilter: filtering SNPs based on whether they cause changes in coding amino acids
usage: SNPfilter codefilter [-h] sample_id input_bcf reference gff_file
filtering SNPs based on whether they cause changes in coding amino acids
positional arguments:
sample_id sample id
input_bcf input bcf file
reference reference genome in fasta format
gff_file gff file for reference genome
optional arguments:
-h, --help show this help message and exit
Example
- Indexing of the reference genome using BWA
bwa index reference.fasta
- Trim the sequencing data
zcat sample1_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample1_R1.trimmed.fastq.gz
zcat sample1_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample1_R2.trimmed.fastq.gz
zcat sample2_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample2_R1.trimmed.fastq.gz
zcat sample2_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample2_R2.trimmed.fastq.gz
zcat sample3_R1.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample3_R1.trimmed.fastq.gz
zcat sample3_R2.fastq.gz | bioawk -cfastx '{print "@"$name"\n"substr($seq, 16)"\n+"$name"\n"substr($qual, 16)}' | gzip > sample3_R2.trimmed.fastq.gz
- Use prepare to generate bam files and BCF files
SNPfilter prepare -t 10 sample1 reference.fasta sample1_R1.trimmed.fastq.gz sample1_R2.trimmed.fastq.gz
SNPfilter prepare -t 10 sample2 reference.fasta sample2_R1.trimmed.fastq.gz sample2_R2.trimmed.fastq.gz
SNPfilter prepare -t 10 sample3 reference.fasta sample3_R1.trimmed.fastq.gz sample3_R2.trimmed.fastq.gz
- Use qcfilter to filter SNPs with a relaxed threshold(d=2, v=0.3)
SNPfilter qcfilter -d 2 -v 0.3 sample1 sample1.bcf
SNPfilter qcfilter -d 2 -v 0.3 sample2 sample2.bcf
SNPfilter qcfilter -d 2 -v 0.3 sample3 sample3.bcf
- Use qcfilter to filter SNPs with a strict threshold(d=5, v=0.9, other sample BCF file as background)
SNPfilter qcfilter -d 5 -v 0.9 -b sample2.d2.v0.30.bcf,sample3.d2.v0.30.bcf sample1 sample1.d2.v0.30.bcf
SNPfilter qcfilter -d 5 -v 0.9 -b sample1.d2.v0.30.bcf,sample3.d2.v0.30.bcf sample2 sample2.d2.v0.30.bcf
SNPfilter qcfilter -d 5 -v 0.9 -b sample1.d2.v0.30.bcf,sample2.d2.v0.30.bcf sample3 sample3.d2.v0.30.bcf
- Use codefilter to filter SNPs that do not cause changes in coding amino acids
SNPfilter codefilter sample1 sample1.d5.v0.90.bcf reference.fasta reference.gff
SNPfilter codefilter sample2 sample2.d5.v0.90.bcf reference.fasta reference.gff
SNPfilter codefilter sample3 sample3.d5.v0.90.bcf reference.fasta reference.gff
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
SNPfilter-0.0.4.tar.gz
(6.7 kB
view details)
Built Distribution
File details
Details for the file SNPfilter-0.0.4.tar.gz
.
File metadata
- Download URL: SNPfilter-0.0.4.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9161ae176039e6da4ff612a7512ac471e314221e1ec3ac768e14140701906e5 |
|
MD5 | 5048ae559e966f53d9f28fd850a216b1 |
|
BLAKE2b-256 | 167f54951e66903dac6f7953f2686b01bdd27b3a3bfe3374cd94386bf767b2cd |
File details
Details for the file SNPfilter-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: SNPfilter-0.0.4-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5fd1df98472740386964fc61dc9180586d544c4fb536c777d83a2a93308c78f |
|
MD5 | 8e418bc55717c9dcd4010ae2511c5df3 |
|
BLAKE2b-256 | 10056dd7c4703a795054c45735a5eae4cf83dbc9b8599910412a5601f9e853fb |