Perform k-mer abundance analysis in DNA sequences
Project description
eme_selex
eme_selex (Every Motif Ever for SELEX Analysis) is a Python package to perform k-mer abundance analysis in DNA sequences. eme_selex is developed to perform fast and efficient analysis of short k-mers (tested with k-mers up to length 10).
While eme_selex can be used for general purpose k-mer analysis, motivation to develop eme_selex is to perform Systemic Evolution of Ligands by EXponential enrichment coupled with High Throughput sequencing (HT-SELEX) analysis in a Pythonic way. By default, for every k-mer, eme_selex quantifies the fraction of reads containing that k-mer in a non-redundant manner. After the quantification, a basic position frequency matrix (PFM) for the top 50 k-mers is generated. If the user wants to generate more PFMs, they can change the top keyword argument to a desired number.
Installation
pip install eme_selex
Usage
Basic Usage
from eme_selex.eme_selex import kmer_fraction_from_file as kf
# By default, keyword arguments for size of the
# k-mer is k=5 and the number of PFMs is top=50
counts, fraction, pfm_models = kf("data/random.fa.gz")
Tutorial for HT-SELEX analysis
Jupyter notebooks detailing the usage of eme_selex and extensive analysis for HT-SELEX are hosted here https://eme_selex.readthedocs.io
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for eme_selex-0.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac4133ab485a88007167c2ad9bf175b993e39d2e61a5d404270803c8113c4d4b |
|
MD5 | 4994c950051e2815cbf39e163da08054 |
|
BLAKE2b-256 | 18dbf4584aaa6fdd8f68eea2f41b7dfe7c66523642146baa0b8f4706e52e06c0 |