Skip to main content

Utilities for calculating sequence metrics.

Project description

Build status Code coverage Maintenance yes GitHub license Documentation Status

seqm

Python utilities for sequence comparison, quantification, and feature extraction.

Installation

~$ pip install seqm

Documentation

Documentation for the package can be found here.

Usage

The seqm module contains functions for calculating sequence-related distance and complexity metrics, commonly used in language processing and next-generation sequencing. It has a simple and consistent API that be used for investigating sequence characteristics:

>>> import seqm
>>> seqm.hamming('ATTATT', 'ATTAGT')
1
>>> seqm.edit('ATTATT', 'ATAGT')
2
>>> seqm.polydict('AAAACCGT')
{'A': 4, 'C': 2, 'G': 1, 'T': 1}
>>> seqm.polylength('AAAACCGT')
4
>>> seqm.entropy('AGGATAAG')
1.40
>>> seqm.gc_percent('AGGATAAG')
0.375
>>> seqm.gc_skew('AGGATAAG')
3.0
>>> seqm.gc_shift('AGGATAAG')
1.67
>>> seqm.dna_weight('AGGATAAG')
3968.59
>>> seqm.rna_weight('AGGATAAG')
4082.59
>>> seqm.aa_weight('AGGATAAG')
700.8
>>> seqm.tm('AGGATAAGAGATAGATTT')
39.31
>>> seqm.zipsize('AGGATAAGAGATAGATTT')
22

It also has a Sequence object for object-based access to these properties:

>>> import seqm
>>> seq = seqm.Sequence('AAAACCGT')
>>> seq.hamming('AAAAGCGT')
1
>>> seq.gc_percent
0.375
>>> seq.revcomplement
ACGTACGT
>>> seq.dna_weight
3895.59
>>> # ... and so on ...

All of the metrics available in the repository are listed below, and can also be found in the API section of the documentation.

Finally, all functions from the seqm module can be used at the command line:

~$ # calculate distance between sequences
~$ seqm edit AAAACCGT AAAAGCGT
1

~$ # calculate gc percent of sequence
~$ seqm gc_percent AAAACCGT
0.375

~$ # generate random sequence and pipe to `wrap` command
~$ seqm random --length 10 | seqm wrap --bases 5 -
ATGGA
TATTA
Sequence Quantification

Function

Metric

seqm.polydict

Length of longest homopolymer for all bases in sequence.

seqm.polylength

Length of longest homopolymer in sequence.

seqm.entropy

Shannon entropy for bases in sequence.

seqm.gc_percent

Percentage of GC bases in sequence relative to all bases.

seqm.gc_skew

GC skew for sequence: (#G - #C)/(#G + #C).

seqm.gc_shift

GC shift for sequence: (#A + #T)/(#G + #C)

seqm.dna_weight

Molecular weight for sequence with DNA backbone.

seqm.rna_weight

Molecular weight for sequence with RNA backbone.

seqm.aa_weight

Molecular weight for amino acid sequence.

seqm.tm

Melting temperature of sequence.

seqm.zipsize

Compressibility of sequence.

Domain Conversion

Function

Conversion

seqm.revcomplement

Length of longest homopolymer for all bases in sequence.

seqm.complement

Length of longest homopolymer in sequence.

seqm.aa

Shannon entropy for bases in sequence.

seqm.wrap

Percentage of GC bases in sequence relative to all bases.

seqm.likelihood

GC skew for sequence: (#G - #C)/(#G + #C).

seqm.qscore

GC shift for sequence: (#A + #T)/(#G + #C)

Distance Metrics

Function

Distance Metric

seqm.hamming

Hamming distance between sequences.

seqm.edit

Edit (levenshtein) distance between sequences

Utilities

Function

Utility

seqm.random_sequence

Generate random sequence.

seqm.wrap

Newline-wrap sequence

Questions/Feedback

File an issue in the GitHub issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqm-0.1.0.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

seqm-0.1.0-py2.py3-none-any.whl (38.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file seqm-0.1.0.tar.gz.

File metadata

  • Download URL: seqm-0.1.0.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for seqm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f07c8f33cadc53ae7524ac4eb5fb9ac716a377ccd81c2b08d6631fbaf4d26638
MD5 2c953d8bb63d5a4d0145f643392ab2a7
BLAKE2b-256 632a349787bbdac2d222901d812b595af054064d8696d22093b747374bd1d5a6

See more details on using hashes here.

File details

Details for the file seqm-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: seqm-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for seqm-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 337b621fb5968c48dca5231ca9d1367efb794955182f866312da7aec79bfae64
MD5 33fdeb722945e3a17888748fc3b44c57
BLAKE2b-256 244c1067247ccf1a43d6dc3260de33475ccc684c06b8630572f7c8b5790fd159

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page