Skip to main content

RSBio-Seq is a fast and light-weight sequence reading library (built on top of rust bio crate).

Project description

RSBio-Seq

Cargo tests Downloads PyPI - Version Upload to PyPI License: GPL v3

██████  ███████ ██████  ██  ██████        ███████ ███████  ██████  
██   ██ ██      ██   ██ ██ ██    ██       ██      ██      ██    ██ 
██████  ███████ ██████  ██ ██    ██ █████ ███████ █████   ██    ██ 
██   ██      ██ ██   ██ ██ ██    ██            ██ ██      ██ ▄▄ ██ 
██   ██ ███████ ██████  ██  ██████        ███████ ███████  ██████  
                                                              ▀▀   

RSBio-Seq intends to provide reading/writing facility on common sequence formats (FASTA/FASTQ) in both raw (fasta, fa, fna, fastq, fq) and compressed formats (.gz).

Installation

1. From PyPI (Recommended)

Use the following command to install from PyPI.

pip install rsbio-seq

2. Build and install from source

To build from source, make sure you have the following programs installed.

To build and install the development version of the wheel.

maturin develop # this installs the development version in the env
maturin develop --rust # this installs a release version in the env

To build a release mode wheel for installation, use this command.

maturin build --release

You will find the whl file inside the target/wheels directory. Your whl file will have a name depicting your python environment and CPU architecture. The built wheel can be installed using this command.

pip install target/wheels/*.whl

Usage

Once installed you can import the library and use as follows.

Reading

from rsbio_seq import SeqReader, Sequence, ascii_to_phred

# each seq entry is of type Sequence
seq: Sequence

for seq in SeqReader("path/to/seq.fasta.gz"):
    print(seq.id)
    print(seq.seq)
    # for fastq quality line
    print(seq.qual) # prints IIII
    print(ascii_to_phred(seq.qual)) # prints [40, 40, 40, 40]
    # optional description attribute
    print(seq.desc)

Writing

from rsbio_seq import SeqWriter, Sequence, phred_to_ascii

# writing fasta
seq = Sequence("id", "desc", "ACGT") # id, description, sequence
writer = SeqWriter("out.fasta")
writer.write(seq)
writer.close()

# writing fastq
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fastq")
writer.write(seq)
writer.close()

# writing gzipped
seq = Sequence("id", "desc", "ACGT", "IIII") # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()

# writing gzipped with phred score translation
qual = phred_to_ascii([40, 40, 40, 40])
seq = Sequence("id", "desc", "ACGT", qual) # id, description, sequence, quality
writer = SeqWriter("out.fq.gz")
writer.write(seq)
writer.close()

Note: close() is only required if you want to read the file again in the same function/code scope. Closing opened files is a good practice either way.

We provide two utility functions for your convenience.

  • phred_to_ascii - convert phred scores list of numbers to a string
  • ascii_to_phred - convert the quality string to a list of numbers

RSBio-Seq reads and write quality string in ascii format only. Please use these helper functions to translate if you intend to read them.

Authors

Support and contributions

Please get in touch via author websites or GitHub issues. Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

rsbio_seq-0.1.3-cp39-abi3-win_amd64.whl (187.8 kB view details)

Uploaded CPython 3.9+ Windows x86-64

rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_x86_64.whl (470.3 kB view details)

Uploaded CPython 3.9+ musllinux: musl 1.2+ x86-64

rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_aarch64.whl (481.4 kB view details)

Uploaded CPython 3.9+ musllinux: musl 1.2+ ARM64

rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.8 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ x86-64

rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (307.5 kB view details)

Uploaded CPython 3.9+ manylinux: glibc 2.17+ ARM64

rsbio_seq-0.1.3-cp39-abi3-macosx_11_0_arm64.whl (269.5 kB view details)

Uploaded CPython 3.9+ macOS 11.0+ ARM64

rsbio_seq-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl (280.3 kB view details)

Uploaded CPython 3.9+ macOS 10.12+ x86-64

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1547c265fc8f1c46bb897ac8c6d8219f656821d79cdcc2c062e71c818c636f28
MD5 148826c7cc5c3e37ec0c4100d460daa0
BLAKE2b-256 3d312f3932845e9cd4ded1f0fbd8f1fddd50269dc063baa1bf76a84df45ae31d

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 279bf9f6219d214880ddf1ef05a5345107dba343f45d070390180a248232d51a
MD5 68d811e04c351c085fd10f718a9cb88a
BLAKE2b-256 5bab5a19ae581d2c6ef2cbd6c0ac0126e7002ead90905f9799f45c0d33f4c769

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 e0d21f11cb7aa9a84fe02d46904a6c0857ccd8e9b27598de7b18ab722202130e
MD5 999e5e49fb24b588b2f826217bd98cd9
BLAKE2b-256 809154961576f1df7a62f113c56f193c39a9aa074016dedede708128ff8ccd97

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6ebe2ee2a40e50bae4c5c061e8776995a311a43ae48d56c96adba6cd98a9baf8
MD5 40893aa8c5c21a0ca5ba9c6fc0284b30
BLAKE2b-256 98b86881fc4e3c00f8447c0ce6d72805ec81ef99d2dfdf4a65d2998453922e44

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d139f5b3571c8fceaba67b8212126c3d5ba2a58d9f1192e672cbdb2908844348
MD5 bb67ca5ca57ba7927c69eb2d5949e127
BLAKE2b-256 7e5ccf9327f2a39c65b5769d49fb988fb13fba7f87c4e0abecdd9ee2270b5b10

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ac583b5b3423b5a2f52172876e3af132318dd31dae83873ef9f5a6097d15e9ee
MD5 7189a6a97b855311cfd292bfa93aada0
BLAKE2b-256 ef638317ce64b2f179d5674b334361acea044cbfead082a957cf1775e7696e9c

See more details on using hashes here.

File details

Details for the file rsbio_seq-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rsbio_seq-0.1.3-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e683fdbb0c10979115bbfc9cbec8fc3e6d167f1b0fbb02d1fe698cc10a5fd260
MD5 99e0030dde55dd09ec77bb986c05428f
BLAKE2b-256 6913768ace9b8b0d82c7773ea3006e7db01890d4a6e68c2b5a557b3171e7b5c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page