No project description provided
Project description
SeqPro (Sequence processing toolkit)
import seqpro as sp
SeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including SeqData, MotifData, SeqExplainer, and EUGENe.
All functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the seqpro.xr
submodule to integrate nicely with SeqData.
Computational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.
Installation
pip install seqpro
API
N = 2
L = 3
# Generating random sequences
seqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)
# Padding
sp.pad_seqs(seqs, pad="right", pad_value="N", length=5, length_axis=-1)
# One-hot encoding and decoding
ohe = sp.ohe(seqs, alphabet=sp.DNA)
sp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char="N")
# Tokenization
token_map = {"A": 7, "C": 8, "G": 9, "T": 10, "N": 11}
tokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)
sp.decode_tokens(tokens, token_map=token_map)
# Reverse complement
sp.reverse_complement(seqs, alphabet=sp.DNA)
# k-let preserving shuffling
sp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)
# Calculating GC or nucleotide content
sp.gc_content(seqs, alphabet=sp.DNA)
sp.nucleotide_content(seqs, alphabet=sp.DNA)
# Randomly jittering sequences
sp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)
# Collapse coverage to a given bin width
sp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)
More to come!
All contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our code of conduct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seqpro-0.1.15.tar.gz
.
File metadata
- Download URL: seqpro-0.1.15.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e9fd87b371cf476fda03921ba48af57b7f2450f2bcf6db73a9fc68c7a8b4db7 |
|
MD5 | 4507cce27cb4dd4376821bf0cd07975d |
|
BLAKE2b-256 | ee4d9ba038cb74320b8ab58822165794b2d3468b955b07fcd286d4d389ed79ae |
File details
Details for the file seqpro-0.1.15-cp39-abi3-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: seqpro-0.1.15-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 407.9 kB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8edee0eddee7a2273f0f28b2c6a52c53cae57e81c20df435bec56178d7b904cd |
|
MD5 | d861f9488c87d6a17fc1072917440c09 |
|
BLAKE2b-256 | 1c4199e8cb7a095e429c53cc1483c33ec3b4fa1a51795b31f4546540dfa70f71 |