No project description provided
Project description
kmer-generator
Get a kmer list given a single DNA or RNA sequence.
Base sets
Kmers can be generated using the standart four nucleotide codes A, C, T (or U), and G, and also all anbiguous codes of IUPAC convention (https://www.bioinformatics.org/sms/iupac.html).
Install
$ pip install KmerGenerator
Usage
Instance the class and store in a object.
from KmerGenerator import KmerGenerator, BaseSet
kgenerator = KmerGenerator()
Print IUPAC convention code used in KmerGenerator.
kgenerator.base_set_descriptions()
The output is:
A => Adenine
C => Cytosine
G => Guanine
T => Thymine
U => Uracil
R => A or G
Y => C or T
S => G or C
W => A or T
K => G or T
M => A or C
B => C or G or T
D => A or G or T
H => A or C or T
V => A or C or G
N => any
Note: To generate kmers with ambiguous IUPAC codes (R, Y, S, ...), create an instance of the class and set the base_set parameter.
Set objects containing example sequence, kmer length, and base_set Enum.
sequence = 'AUCAUCAUGGGAUAUAUUGGCCCCCUAARCUUAUAUCUCUGGSAAUGACUCUAUAUU'
k = 3
base_set = BaseSet.Dubious2
Then, generate kmer.
kmers = kgenerator.count_kmer(sequence, k, base_set)
print(kmers)
The output:
[{'AAU': 1}, {'AAR': 1}, {'ACU': 1}, {'AUA': 3}, {'AUC': 3}, {'AUG': 2}, {'AUU': 2}, {'ARC': 1}, {'CAU': 2}, {'CCC': 1}, {'CCU': 1}, {'CUA': 2}, {'CUC': 2}, {'CUG': 1}, {'CUU': 1}, {'GAC': 1}, {'GAU': 1}, {'GCC': 1}, {'GGA': 1}, {'GGC': 1}, {'GGG': 1}, {'GGS': 1}, {'GSA': 1}, {'UAA': 1}, {'UAU': 3}, {'UCA': 2}, {'UCU': 2}, {'UGA': 1}, {'UGG': 3}, {'UUA': 1}, {'UUG': 1}, {'RCU': 1}, {'SAA': 1}]
Notice that only kmers with a frequency higher than zero are returned.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for KmerGenerator-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45749c3a20c71f333c3f6319f9a80d3e51f4902386c6114c0436d863d4d5138b |
|
MD5 | a87eef2b47b80e76065dd325d0cccae3 |
|
BLAKE2b-256 | 8e56cc1e947e376d6f6f68835c42a94fe449adb4f6fe73abed0f7aead1055b42 |