Utilities for calculating sequence metrics.
Project description
seqm
Python utilities for sequence comparison, quantification, and feature extraction.
Installation
~$ pip install seqm
Documentation
Documentation for the package can be found here.
Usage
The seqm module contains functions for calculating sequence-related distance and complexity metrics, commonly used in language processing and next-generation sequencing. It has a simple and consistent API that be used for investigating sequence characteristics:
>>> import seqm
>>> seqm.hamming('ATTATT', 'ATTAGT')
1
>>> seqm.edit('ATTATT', 'ATAGT')
2
>>> seqm.polydict('AAAACCGT')
{'A': 4, 'C': 2, 'G': 1, 'T': 1}
>>> seqm.polylength('AAAACCGT')
4
>>> seqm.entropy('AGGATAAG')
1.40
>>> seqm.gc_percent('AGGATAAG')
0.375
>>> seqm.gc_skew('AGGATAAG')
3.0
>>> seqm.gc_shift('AGGATAAG')
1.67
>>> seqm.dna_weight('AGGATAAG')
3968.59
>>> seqm.rna_weight('AGGATAAG')
4082.59
>>> seqm.aa_weight('AGGATAAG')
700.8
>>> seqm.tm('AGGATAAGAGATAGATTT')
39.31
>>> seqm.zipsize('AGGATAAGAGATAGATTT')
22
It also has a Sequence object for object-based access to these properties:
>>> import seqm
>>> seq = seqm.Sequence('AAAACCGT')
>>> seq.hamming('AAAAGCGT')
1
>>> seq.gc_percent
0.375
>>> seq.revcomplement
ACGTACGT
>>> seq.dna_weight
3895.59
>>> # ... and so on ...
All of the metrics available in the repository are listed below, and can also be found in the API section of the documentation.
Finally, all functions from the seqm module can be used at the command line:
~$ # calculate distance between sequences
~$ seqm edit AAAACCGT AAAAGCGT
1
~$ # calculate gc percent of sequence
~$ seqm gc_percent AAAACCGT
0.375
~$ # generate random sequence and pipe to `wrap` command
~$ seqm random --length 10 | seqm wrap --bases 5 -
ATGGA
TATTA
Sequence Quantification
Function |
Metric |
---|---|
seqm.polydict |
Length of longest homopolymer for all bases in sequence. |
seqm.polylength |
Length of longest homopolymer in sequence. |
seqm.entropy |
Shannon entropy for bases in sequence. |
seqm.gc_percent |
Percentage of GC bases in sequence relative to all bases. |
seqm.gc_skew |
GC skew for sequence: (#G - #C)/(#G + #C). |
seqm.gc_shift |
GC shift for sequence: (#A + #T)/(#G + #C) |
seqm.dna_weight |
Molecular weight for sequence with DNA backbone. |
seqm.rna_weight |
Molecular weight for sequence with RNA backbone. |
seqm.aa_weight |
Molecular weight for amino acid sequence. |
seqm.tm |
Melting temperature of sequence. |
seqm.zipsize |
Compressibility of sequence. |
Domain Conversion
Function |
Conversion |
---|---|
seqm.revcomplement |
Length of longest homopolymer for all bases in sequence. |
seqm.complement |
Length of longest homopolymer in sequence. |
seqm.aa |
Shannon entropy for bases in sequence. |
seqm.wrap |
Percentage of GC bases in sequence relative to all bases. |
seqm.likelihood |
GC skew for sequence: (#G - #C)/(#G + #C). |
seqm.qscore |
GC shift for sequence: (#A + #T)/(#G + #C) |
Distance Metrics
Function |
Distance Metric |
---|---|
seqm.hamming |
Hamming distance between sequences. |
seqm.edit |
Edit (levenshtein) distance between sequences |
Utilities
Function |
Utility |
---|---|
seqm.random_sequence |
Generate random sequence. |
seqm.wrap |
Newline-wrap sequence |
Questions/Feedback
File an issue in the GitHub issue tracker.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seqm-0.1.0.tar.gz
.
File metadata
- Download URL: seqm-0.1.0.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f07c8f33cadc53ae7524ac4eb5fb9ac716a377ccd81c2b08d6631fbaf4d26638 |
|
MD5 | 2c953d8bb63d5a4d0145f643392ab2a7 |
|
BLAKE2b-256 | 632a349787bbdac2d222901d812b595af054064d8696d22093b747374bd1d5a6 |
File details
Details for the file seqm-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: seqm-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 38.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 337b621fb5968c48dca5231ca9d1367efb794955182f866312da7aec79bfae64 |
|
MD5 | 33fdeb722945e3a17888748fc3b44c57 |
|
BLAKE2b-256 | 244c1067247ccf1a43d6dc3260de33475ccc684c06b8630572f7c8b5790fd159 |