RSBio-Seq is a fast and light-weight sequence reading library (built on top of rust bio crate).
Project description
RSBio-Seq
██████ ███████ ██████ ██ ██████ ███████ ███████ ██████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██████ ███████ ██████ ██ ██ ██ █████ ███████ █████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ▄▄ ██ ██ ██ ███████ ██████ ██ ██████ ███████ ███████ ██████ ▀▀
RSBio-Seq intends to provide reading/writing facility on common sequence formats (FASTA/FASTQ) in both raw (fasta
, fa
, fna
, fastq
, fq
) and compressed formats (.gz
).
Installation
1. From PyPI (Recommended)
Use the following command to install from PyPI.
pip install rsbio-seq
2. Build and install from source
To build from source, make sure you have the following programs installed.
- Rust - https://www.rust-lang.org/tools/install
- Maturin - https://www.maturin.rs/installation
- Python environment with Python >=3.9 - https://www.python.org/downloads/
To build and install the development version of the wheel.
maturin develop # this installs the development version in the env
maturin develop --rust # this installs a release version in the env
To build a release mode wheel for installation, use this command.
maturin build --release
You will find the whl
file inside the target/wheels
directory. Your whl
file will have a name depicting your python environment and CPU architecture. The built wheel can be installed using this command.
pip install target/wheels/*.whl
Usage
Once installed you can import the library and use as follows.
Reading
from rsbio_seq import SeqReader, Sequence, ascii_to_phred
# each seq entry is of type Sequence
seq: Sequence
for seq in SeqReader("path/to/seq.fasta.gz"):
print(seq.id)
print(seq.seq)
# for fastq quality line
print(seq.qual) # prints IIII
print(ascii_to_phred(seq.qual)) # prints [40, 40, 40, 40]
# optional description attribute
print(seq.desc)
Writing
from rsbio_seq import SeqWriter, Sequence, phred_to_ascii
# writing fasta
seq = Sequence("id", "desc", "ACGT") # id, description, sequence
writer = SeqWriter("out.fasta")
writer.write(seq)
writer.close()
# writing fastq
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fastq")
writer.write(seq)
writer.close()
# writing gzipped
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()
# writing gzipped with phred score translation
qual = phred_to_ascii([40, 40, 40, 40])
seq = Sequence("id", "desc", "ACGT", qual) # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()
Note: close()
is only required if you want to read the file again in the same function/code scope. Closing opened files is a good practice either way.
We provide two utility functions for your convenience.
phred_to_ascii
- convert phred scores list of numbers to a stringascii_to_phred
- convert the quality string to a list of numbers
RSBio-Seq reads and write quality string in ascii format only. Please use these helper functions to translate if you intend to read them.
Authors
- Anuradha Wickramarachchi https://anuradhawick.com
- Vijini Mallawaarachchi https://vijinimallawaarachchi.com
Support and contributions
Please get in touch via author websites or GitHub issues. Thanks!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-win_amd64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 187.8 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1547c265fc8f1c46bb897ac8c6d8219f656821d79cdcc2c062e71c818c636f28 |
|
MD5 | 148826c7cc5c3e37ec0c4100d460daa0 |
|
BLAKE2b-256 | 3d312f3932845e9cd4ded1f0fbd8f1fddd50269dc063baa1bf76a84df45ae31d |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 470.3 kB
- Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 279bf9f6219d214880ddf1ef05a5345107dba343f45d070390180a248232d51a |
|
MD5 | 68d811e04c351c085fd10f718a9cb88a |
|
BLAKE2b-256 | 5bab5a19ae581d2c6ef2cbd6c0ac0126e7002ead90905f9799f45c0d33f4c769 |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 481.4 kB
- Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0d21f11cb7aa9a84fe02d46904a6c0857ccd8e9b27598de7b18ab722202130e |
|
MD5 | 999e5e49fb24b588b2f826217bd98cd9 |
|
BLAKE2b-256 | 809154961576f1df7a62f113c56f193c39a9aa074016dedede708128ff8ccd97 |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 304.8 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ebe2ee2a40e50bae4c5c061e8776995a311a43ae48d56c96adba6cd98a9baf8 |
|
MD5 | 40893aa8c5c21a0ca5ba9c6fc0284b30 |
|
BLAKE2b-256 | 98b86881fc4e3c00f8447c0ce6d72805ec81ef99d2dfdf4a65d2998453922e44 |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 307.5 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d139f5b3571c8fceaba67b8212126c3d5ba2a58d9f1192e672cbdb2908844348 |
|
MD5 | bb67ca5ca57ba7927c69eb2d5949e127 |
|
BLAKE2b-256 | 7e5ccf9327f2a39c65b5769d49fb988fb13fba7f87c4e0abecdd9ee2270b5b10 |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 269.5 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac583b5b3423b5a2f52172876e3af132318dd31dae83873ef9f5a6097d15e9ee |
|
MD5 | 7189a6a97b855311cfd292bfa93aada0 |
|
BLAKE2b-256 | ef638317ce64b2f179d5674b334361acea044cbfead082a957cf1775e7696e9c |
File details
Details for the file rsbio_seq-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: rsbio_seq-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 280.3 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e683fdbb0c10979115bbfc9cbec8fc3e6d167f1b0fbb02d1fe698cc10a5fd260 |
|
MD5 | 99e0030dde55dd09ec77bb986c05428f |
|
BLAKE2b-256 | 6913768ace9b8b0d82c7773ea3006e7db01890d4a6e68c2b5a557b3171e7b5c0 |