SAMsift - sift your alignments
Project description
SAMsift is a program for advanced filtering and tagging of SAM/BAM alignments using Python expressions.
Getting started
git clone http://github.com/karel-brinda/samsift
cd samsift
# keep only alignments with alignment score >94
samsift/samsift -i tests/test.bam -o filtered.sam -f 'AS>94'
# add tags 'ln' with sequence length and 'ab' with average base quality
samsift/samsift -i tests/test.bam -o with_ln_ab.sam -c 'ln=len(SEQ);ab=1.0*sum(QUAL)/ln'
Installation
Using Bioconda:
# add all necessary Bioconda channels
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
# install samsift
conda install samsift
Using PIP from PyPI:
pip install --upgrade samsift
Using PIP from Github:
pip install --upgrade git+https://github.com/karel-brinda/samsift
Command-line parameters
usage: samsift.py [-h] [-v] [-i file] [-o file] [-f py_expr] [-c py_code]
[-d py_expr] [-t py_expr]
Program: samsift (advanced filtering and tagging of SAM/BAM alignments using Python expressions)
Version: 0.1.0
Author: Karel Brinda <kbrinda@hsph.harvard.edu>
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-i file input SAM/BAM file [-]
-o file output SAM/BAM file [-]
-f py_expr filter [True]
-c py_code code to be executed (e.g., assigning new tags) [None]
-d py_expr debugging expression to print [None]
-t py_expr debugging trigger [True]
Algorithm
for ALIGNMENT in ALIGNMENTS:
if eval(DEBUG_TRIGER):
print(eval(DEBUG_EXPR))
if eval(FILTER):
exec(CODE)
print(ALIGNMENT)
All Python expressions can access variables mirroring the fields from the alignment section of the SAM specification, i.e., QNAME, FLAG, RNAME, POS (1-based), MAPQ, CIGAR , RNEXT, PNEXT, TLEN, SEQ, and QUAL. For instance, keeping only reads with POS smaller than 10000 can be done by
samsift -i tests/test.bam -f 'POS<10000'
The PySAM representation of current alignment (class pysam.AlignedSegment) is available through variable a. Therefore, the previous example is equivalent to
samsift -i tests/test.bam -f 'a.reference_starts+1<10000'
All SAM tags are translated to variables with equal name. For instance, if alignment score is provided through the AS tag (as it is defined in the Sequence Alignment/Map Optional Fields Specification), then alignments with score smaller or equal to the sequence length can be removed using
samsift -i tests/test.bam -f 'AS>len(SEQ)'
If CODE is provided, all two-letter variables are back-translated to tags. For instance, a tag ab carrying the average base quality can be added by
samsift -i tests/test.bam -c 'ab=1.0*sum(QUAL)/ln'
Similar programs
samtools view can filter alignments based on FLAGS, read group tags, and CIGAR strings.
sambamba view supports, in addition to SAMtools, filtration using simple perl expression. However, it’s not possible to compare different tags.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.