No project description provided
Project description
SeqLike - flexible biological sequence objects in Python
Introduction
A single object API that makes working with biological sequences in Python more ergonomic. It'll handle anything like a sequence.
Built around the Biopython SeqRecord class, SeqLikes abstract over the semantics of molecular biology (DNA -> RNA -> AA) and data structures (strings, Seqs, SeqRecords, numerical encodings) to allow manipulation of a biological sequence at the level which is most computationally convenient.
Code samples and examples
Build data-type agnostic functions
def f(seq: SeqLikeType, *args):
seq = SeqLike(seq, seq_type="nt").to_seqrecord()
# ...
Streamline conversion to/from ML friendly representations
prediction = model(aaSeqLike('MSKGEELFTG').to_onehot())
new_seq = ntSeqLike(generative_model.sample(), alphabet="-ACGTUN")
Interconvert between AA and NT forms of a sequence
Back-translation is conveniently built-in!
s_nt = ntSeqLike("ATGTCTAAAGGTGAA")
s_nt[0:3] # ATG
s_nt.aa()[0:3] # MSK, nt->aa is well defined
s_nt.aa()[0:3].nt() # ATGTCTAAA, works because SeqLike now has both reps
s_nt[:-1].aa() # TypeError, len(s_nt) not a multiple of 3
s_aa = aaSeqLike("MSKGE")
s_aa.nt() # AttributeError, aa->nt is undefined w/o codon map
s_aa = aaSeqLike(s_aa, codon_map=random_codon_map)
s_aa.nt() # now works, backtranslated to e.g. ATGTCTAAAGGTGAA
s_aa[:1].nt() # ATG, codon_map is maintained
Easily plot multiple sequence alignments
seqs = [s for s in SeqIO.parse("file.fasta", "fasta")]
df = pd.DataFrame(
{
"names": [s.name for s in seqs],
"seqs": [aaSeqLike(s) for s in seqs],
}
)
df["aligned"] = df["seqs"].seq.align()
df["aligned"].seq.plot()
Flexibly build and parse numerical sequence representations
# Assume you have a dataframe with a column of 10 SeqLikes of length 90
df["seqs"].seq.to_onehot().shape # (10, 90, 23), padded if needed
To see more in action, please check out the docs!
Getting Started
Install the library with pip
or conda
.
With pip
pip install seqlike
With conda
conda install -c conda-forge seqlike
Authors
Support
- Questions about usage should be posed on Stack Overflow with the #seqlike tag.
- Bug reports and feature requests are managed using the Github issue tracker.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Nasos Dousis 💻 |
andrew giessel 💻 |
Max Wall 💻 📖 |
Eric Ma 💻 📖 |
Mihir Metkar 🤔 💻 |
Marcus Caron 📖 |
pagpires 📖 |
Sugato Ray 🚇 🚧 |
Damien Farrell 💻 |
Farbod Mahmoudinobar 💻 |
Jacob Hayes 🚇 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seqlike-1.5.4.tar.gz
.
File metadata
- Download URL: seqlike-1.5.4.tar.gz
- Upload date:
- Size: 374.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bc70f80a02488863950b99be916e07a01118a29f7d89e1ff15a7dc39147f178 |
|
MD5 | f453f04a8e61cb8d8bee1090d401653a |
|
BLAKE2b-256 | 0cde303ecacd90a8c2dd08c11bbaea9357b67976a241355f3f470890edacac12 |
File details
Details for the file seqlike-1.5.4-py3-none-any.whl
.
File metadata
- Download URL: seqlike-1.5.4-py3-none-any.whl
- Upload date:
- Size: 378.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d21ab703bbc57aec66be660361e1bd2cd2ca7b69af5f887e84b5500cac3d4c89 |
|
MD5 | ea73de84ef1ec48c10a32b4550be263b |
|
BLAKE2b-256 | d3bfa46d38a16859aeee5483411462890b126ba37569ee74a6f7c8935ea67794 |