Skip to main content

A light-weight python package for summarizing sequence coverage from SAM and BAM files

Project description

samsum

A light-weight python package for summarizing sequence coverage from SAM and BAM files

tests build PyPI version

Codacy Badge codecov

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

Installation

Samsum is currently supported on Mac and Linux systems and has been tested primarily on Ubuntu operating systems (bionic and trusty distributions). It is a python package on the Python Package Index (PyPI) and can be installed using pip:

pip install samsum

Samsum can also be installed using conda with the command:

conda install -c bioconda samsum

You can also install samsum from source by cloning the directory from its GitHub page or downloading a GitHub release.

git clone https://github.com/hallamlab/samsum.git
cd samsum
python3 setup.py sdist
pip install dist/samsum*tar.gz

Usage

samsum stats will read either a SAM or BAM file (this functionality will be implemented soon) and rapidly count the number of reads mapped to each reference sequence (e.g. contigs, scaffolds) while also keeping track of the reads that remain unmapped. This all occurs within the C++ Python extension. It will then read the reference FASTA file to gather the lengths of each reference sequence. Combining the read counts and sequence lengths, it will then calculate:

  • fragments per kilobase per million (FPKM)
  • transcripts per milllion (TPM)

Command-line options

By default, reads with multiple identical alignments (i.e. mapping quality is 0) are not included in these calculations. This can be toggled off to include these alignments with the --m flag. Another option is to drop counts for reference sequences if only a portion of a sequence is mapped to. With the -p argument, you can control the minimum proportion a reference sequence needs to be covered for its read counts to be included in the output; all stats are otherwise set to 0.

An example command is:

samsum stats -f ref.fasta -a alignments.sam --multireads -p 0.5 -o output_dir/samsum_table.tsv

This will include all alignments, regardless of their mapping quality but only report alignments for reference sequences that were covered across at least 50% of their length.

API

Being a python package, samsum can also be readily imported into python code and used via its API.

The function generally desired would be ref_sequence_abundances. Usage could be:

from samsum import commands
sam="/home/user/reads_to_genome.sam"
fasta="/home/user/genome.fasta"
ref_seq_abunds = commands.ref_sequence_abundances(aln_file=sam, seq_file=fasta, min_aln=10, p_cov=0, map_qual=0)

The ref_seq_abunds object is a dictionary of RefSequence instances indexed by their header/sequence names. RefSequence objects have several variables that are of interest:

  • self.name is the name of the (reference) sequence or header
  • self.length is the length (in base-pairs) of the sequence
  • self.reads_mapped is the number of reads that were mapped
  • self.weight_total is the number of fragments (float) that were mapped to the sequence
  • self.fpkm is Fragments Per Kilobase per Million mapped reads
  • self.tpm is Transcripts Per Million mapped reads

Outputs

If samsum stats was executed, a "samsum_log.txt" file is written to the current working directory (i.e. where samsum was executed from). A comma-separated value (CSV) file with the fields "QueryName", "RefSequence", "ProportionCovered", "Coverage", "Fragments", "FPKM" and "TPM" is written to a file path specified on the command-line, or by default "samsum_table.csv". A TSV file can be written instead if the sep argument was modified to 'tab'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samsum-0.1.4.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

samsum-0.1.4-cp38-cp38-manylinux2014_x86_64.whl (396.4 kB view details)

Uploaded CPython 3.8

samsum-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl (57.1 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

samsum-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl (398.6 kB view details)

Uploaded CPython 3.7m

samsum-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl (57.2 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

samsum-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl (395.8 kB view details)

Uploaded CPython 3.6m

samsum-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl (57.2 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file samsum-0.1.4.tar.gz.

File metadata

  • Download URL: samsum-0.1.4.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4.tar.gz
Algorithm Hash digest
SHA256 36b1fc0eaa4da1b7a70d2791357c7547612c9653e4e0ea7db39ba96cdd03ffc5
MD5 49b234aa3363ae346480f497573bad75
BLAKE2b-256 2e7297a9685d229361484444748f4cd9be3ea9ce3abbe2c0fa21b336ee9c071f

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 396.4 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c592f7a74370e3306174cf6cca479227b38385eb11805f4ddb42fae8b19fa891
MD5 f71e91b196f8375032393ffe61221f1d
BLAKE2b-256 2840331693ef76326d004d38d5900b14f14599725c1f890f7093f3eda7cfdf8b

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 57.1 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 02d33ae2d7517252c63ccde0be796625511786c7e408d644729dde3a52ce383f
MD5 5a25f906924da4014673ab7a4c551dfa
BLAKE2b-256 e713968d6eb46778e3b8bb7c6bdf398c28e5da166e8a00e2ff4501af5fa62e6b

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 398.6 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7a820cadafdd471420f8968e605ea6bc9d7a4b7897d5590a2de91c4f34cb0e97
MD5 acc3a8280a5ac597cdf07d7c0697c638
BLAKE2b-256 875cbe99b14614b7c77671abf76313da642f7ccbfe96710a6910a31daacfbca2

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 57.2 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1689806f51e4303c72f439b68e10a04ca68338de4434e78e1d5092f87fa428b0
MD5 f3dc1bb0093605c759f9b81dea1739e1
BLAKE2b-256 4f23480f4f8706bfeb36af32c265aa2507f4f75c58d29852ab58f172b9c057c6

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 395.8 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b7008c8c6ded9594a562423324bb88092a0aec396e16b48907e00f30290033c
MD5 2fe829ae258edc5cbec6452e53af1752
BLAKE2b-256 1cd6dd28fffbce4b3a4d5bf0714816c9762d422ab6ff16f545417739cb1fea08

See more details on using hashes here.

File details

Details for the file samsum-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: samsum-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 57.2 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for samsum-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0af11ec16faf3c65e1342f925de58002a9e585a9ee376ea1b82a92be5dcf1c28
MD5 2ecb2fd82b5cc44eb25e98045e0c3531
BLAKE2b-256 06261e37c72dd819ace1067aac5033ad25993005992a18e0e62d2af1ad6fbc33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page