Perform k-mer abundance analysis in DNA sequences
Project description
eme_selex
eme_selex (Every Motif Ever for SELEX Analysis) is a Python package to perform k-mer abundance analysis in DNA sequences. eme_selex is developed to perform fast and efficient analysis of short k-mers (tested with k-mers up to length 10).
While eme_selex can be used for general purpose k-mer analysis, motivation to develop eme_selex is to perform Systemic Evolution of Ligands by EXponential enrichment coupled with High Throughput sequencing (HT-SELEX) analysis in a Pythonic way. By default, for every k-mer, eme_selex quantifies the fraction of reads containing that k-mer in a non-redundant manner. After the quantification, a basic position frequency matrix (PFM) for the top 50 k-mers is generated. If the user wants to generate more PFMs, they can change the top keyword argument to a desired number.
Installation
pip install eme_selex
Usage
Basic Usage
from eme_selex.eme_selex import kmer_fraction_from_file as kf
# By default, keyword arguments for size of the
# k-mer is k=5 and the number of PFMs is top=50
counts, fraction, pfm_models = kf("data/random.fa.gz")
Tutorial for HT-SELEX analysis
Jupyter notebooks detailing the usage of eme_selex and extensive analysis for HT-SELEX are hosted https://eme_selex_usage.readthedocs.io
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for eme_selex-0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ff782722a22c5fe1de09788fc5c04a34b357614d5914e824db49524680257f8 |
|
MD5 | 3487b47081c1a115f852089b70ec4d92 |
|
BLAKE2b-256 | e0667f662307a0dc2fec37a5157a6f41857c6cbed6aa5fe2d762dc388b1b9edd |