Skip to main content

SAMsift - sift your alignments

Project description

SAMsift is a program for advanced filtering and tagging of SAM/BAM alignments using Python expressions.

Getting started

git clone http://github.com/karel-brinda/samsift
cd samsift
# keep only alignments with alignment score >94
samsift/samsift -i tests/test.bam -o filtered.sam -f 'AS>94'
# add tags 'ln' with sequence length and 'ab' with average base quality
samsift/samsift -i tests/test.bam -o with_ln_ab.sam -c 'ln=len(SEQ);ab=1.0*sum(QUAL)/ln'

Installation

Using Bioconda:

# add all necessary Bioconda channels
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# install samsift
conda install samsift

Using PIP from PyPI:

pip install --upgrade samsift

Using PIP from Github:

pip install --upgrade git+https://github.com/karel-brinda/samsift

Command-line parameters

usage: samsift.py [-h] [-v] [-i file] [-o file] [-f py_expr] [-c py_code]
                  [-d py_expr] [-t py_expr]

Program: samsift (advanced filtering and tagging of SAM/BAM alignments using Python expressions)
Version: 0.1.0
Author:  Karel Brinda <kbrinda@hsph.harvard.edu>

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit
  -i file        input SAM/BAM file [-]
  -o file        output SAM/BAM file [-]
  -f py_expr     filter [True]
  -c py_code     code to be executed (e.g., assigning new tags) [None]
  -d py_expr     debugging expression to print [None]
  -t py_expr     debugging trigger [True]

Algorithm

for ALIGNMENT in ALIGNMENTS:
        if eval(DEBUG_TRIGER):
                print(eval(DEBUG_EXPR))
        if eval(FILTER):
                exec(CODE)
                print(ALIGNMENT)

All Python expressions can access variables mirroring the fields from the alignment section of the SAM specification, i.e., QNAME, FLAG, RNAME, POS (1-based), MAPQ, CIGAR , RNEXT, PNEXT, TLEN, SEQ, and QUAL. For instance, keeping only reads with POS smaller than 10000 can be done by

samsift -i tests/test.bam -f 'POS<10000'

The PySAM representation of current alignment (class pysam.AlignedSegment) is available through variable a. Therefore, the previous example is equivalent to

samsift -i tests/test.bam -f 'a.reference_starts+1<10000'

All SAM tags are translated to variables with equal name. For instance, if alignment score is provided through the AS tag (as it is defined in the Sequence Alignment/Map Optional Fields Specification), then alignments with score smaller or equal to the sequence length can be removed using

samsift -i tests/test.bam -f 'AS>len(SEQ)'

If CODE is provided, all two-letter variables are back-translated to tags. For instance, a tag ab carrying the average base quality can be added by

samsift -i tests/test.bam -c 'ab=1.0*sum(QUAL)/ln'

Similar programs

Author

Karel Brinda <kbrinda@hsph.harvard.edu>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samsift-0.1.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

samsift-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file samsift-0.1.0.tar.gz.

File metadata

  • Download URL: samsift-0.1.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for samsift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2508a030faee574e50fcfb0e2a76fc4ed5c90d2d434e68a63cdf04e1950ee2b9
MD5 d61699655e3724652415f890db9c0fbc
BLAKE2b-256 bf8863dd104a0ca7c3bf9f251bd45a7f9ab75e6db7fdfd0afc31a4f4377d3f17

See more details on using hashes here.

File details

Details for the file samsift-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for samsift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 806670711dcac34d90eda341ef99134dcbd38f6a717aed6a73d94c1cea21bf65
MD5 d802fb31c3567ec3d0f8ea4dfce938a8
BLAKE2b-256 4974bfd6db7a8eb094842ae7b91368a7943bf950da6cca27a69e78ba6ef3bbff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page