Skip to main content

No project description provided

Project description

PyPI - Downloads GitHub stars

SeqPro (Sequence processing toolkit)

import seqpro as sp

SeqPro is a Python package for processing DNA/RNA sequences, with limited support for protein sequences. SeqPro is fully functional on its own but is heavily utilized by other packages including SeqData, MotifData, SeqExplainer, and EUGENe.

All functions in SeqPro take as input a string, a list of strings, a NumPy array of strings, a NumPy array of single character bytes (S1) or a NumPy array of one-hot encoded strings. There is also emerging integration with XArray through the seqpro.xr submodule to integrate nicely with SeqData.

Computational bottelnecks or code that is impossible to vectorize with NumPy alone are accelerated with Numba e.g. padding sequences, one-hot encoding, converting from one-hot encoding to nucleotides, etc.

Installation

pip install seqpro

API

N = 2
L = 3

# Generating random sequences
seqs = sp.random_seqs(shape=(N, L), alphabet=sp.DNA, seed=1234)

# Padding
sp.pad_seqs(seqs, pad="right", pad_value="N", length=5, length_axis=-1)

# One-hot encoding and decoding
ohe = sp.ohe(seqs, alphabet=sp.DNA)
sp.decode_ohe(ohe, ohe_axis=-1, alphabet=sp.DNA, unknown_char="N")

# Tokenization
token_map = {"A": 7, "C": 8, "G": 9, "T": 10, "N": 11}
tokens = sp.tokenize(seqs, token_map=token_map, unknown_token=11)
sp.decode_tokens(tokens, token_map=token_map)

# Reverse complement
sp.reverse_complement(seqs, alphabet=sp.DNA)

# k-let preserving shuffling
sp.k_shuffle(seqs, k=2, length_axis=1, seed=1234)

# Calculating GC or nucleotide content
sp.gc_content(seqs, alphabet=sp.DNA)
sp.nucleotide_content(seqs, alphabet=sp.DNA)

# Randomly jittering sequences
sp.jitter(seqs, max_jitter=128, length_axis=1, seed=1234)

# Collapse coverage to a given bin width
sp.bin_coverage(coverage, bin_width=128, length_axis=1, normalize=False)

More to come!

All contributions, including bug reports, documentation improvements, and enhancement suggestions are welcome. Everyone within the community is expected to abide by our code of conduct

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqpro-0.1.13.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

seqpro-0.1.13-cp39-abi3-manylinux_2_28_x86_64.whl (407.6 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.28+ x86-64

File details

Details for the file seqpro-0.1.13.tar.gz.

File metadata

  • Download URL: seqpro-0.1.13.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for seqpro-0.1.13.tar.gz
Algorithm Hash digest
SHA256 f8a87d2c720cf21352351e70a883d5b3bc88b2b1f50a6017e24f9af03fa8ed1e
MD5 38db00a951027e1e3c6528b0a0ed9499
BLAKE2b-256 eddaf868e3fdadfab435efeecd9d39116862f8b3b7f1f81ddc440389eb5cb74f

See more details on using hashes here.

Provenance

File details

Details for the file seqpro-0.1.13-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for seqpro-0.1.13-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c73334bd517e603c33942dfd66a94d8cf91767609b63efd382f9fe27bf999122
MD5 2b04b0f3d02862e134ef3c871bd4a60f
BLAKE2b-256 af7338c75d53e67039a47aa54bf34afedeee4f475574bfb7e6de40d90232f1ca

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page