Python bindings for microbiorust, Microbiology friendly bioinformatics functions
Project description
microbiorust 🦀
Python bindings for microBioRust — a high-performance, modular bioinformatics toolkit written in Rust.
microbiorust provides fast and memory-efficient bioinformatics functionality to Python users by leveraging the power of Rust, exposed through PyO3. This package aims to offer an alternative to libraries like Biopython, with a focus on speed, correctness, and extensibility.
Installation
pip install microbiorust
to use the Python tests with pytest
python3 -m pytest -s tests/test_mbr.py
Wheels are available for Linux, macOS and Windows (Python 3.10+). No Rust toolchain required. (no requirement to install Rust)
Build from source
If you prefer to build from source using maturin:
pip install maturin
git clone https://github.com/microBioRust/microBioRust
cd microbiorust-py
maturin develop --features extension-module
To verify the Python module functions are correctly exposed from Rust:
cargo test
Features
- Fast parsers for GenBank and EMBL formats
- Fast parsers for BLAST XML and tabular formats
- Fast parser for MSA alignments — subset, purge_gaps, get_consensus
- Write directly to GFF3, FAA, FNA and FFN formats
- Typed collections, return type enforces what data is returned
- Accurate feature extraction, gene, product, strand, start, stop, codon_start
- Native JSON serialization to instantly export extracted data structures to standard JSON strings
- Sequence metrics: hydrophobicity, amino acid counts and percentages
- Python API for easy integration into existing pipelines
- Built with Rust for memory safety and performance
Modules
microbiorust gbk — GenBank format
import microbiorust as mb
# Write directly to file — most efficient for large files
# all functions are also available for embl format parse_embl, embl_to_faa, embl_to_ffn etc.)
collection = mb.parse_gbk("genome.gbk")
collection.write_faa("output.faa")
collection.write_ffn("output.ffn")
collection.write_fna("output.fna")
# Flat access across whole genome file — returns FaaCollection
faa = mb.gbk_to_faa("genome.gbk")
# print valid protein fasta
for info in faa.values():
print(f">{info.locus_tag}\n{info.faa}")
# Per-contig record access
for record in collection.values():
# prints the contig id and sequence
print(record.id(), record.sequence())
# protein sequences
for info in record.faa().values():
# prints the protein fasta for each predicted protein in the record
print(f">{info.locus_tag}\n{info.faa}")
# nucleotide sequences
for info in record.ffn().values():
# prints the nucleotide fasta sequence of each predicted gene
print(f">{info.locus_tag}\n{info.ffn}")
# features
features = record.features()
# prints the features of each predicted gene by locus tag key
if "b3304" in features:
feat = features["b3304"]
print(f"Gene: {feat.gene}, Product: {feat.product}")
print(f"Location: {feat.start}..{feat.stop}, Strand: {feat.strand}")
# Convert collection to JSON string
json_str = collection.to_json()
print(json_str)
# Parse JSON string into Python dictionary
data = json.loads(json_str)
# Count proteins without loading sequences
count = mb.gbk_to_faa_count("genome.gbk")
# Convert annotations from gbk or embl to GFF3
mb.gbk_to_gff("genome.gbk", dna=True)
---
### EMBL format: illustrates use by calling on the submodule, can also be called directly as mb.embl_to_faa etc.
```python
from microbiorust import embl
# Extract protein sequences to FASTA
embl.embl_to_faa("input.embl", "output.faa")
# Extract nucleotide sequences to FASTA
embl.embl_to_fna("input.embl", "output.fna")
# Convert annotations to GFF3
embl.embl_to_gff("input.embl", "output.gff")
microbiorust seqmetrics — Sequence metrics
from microbiorust import seqmetrics
sequence = "MKTLLLTLVVVTIVCLDLGAVGNGSSLSEDKDNVHK"
# Hydrophobicity score
window_size = 5
score = seqmetrics.hydrophobicity(sequence, window_size)
# Amino acid counts
counts = seqmetrics.amino_counts(sequence)
# Amino acid percentages
percentages = seqmetrics.amino_percentage(sequence)
microbiorust align — Multiple sequence alignment
from microbiorust import align
# Subset a fasta format MSA by row and column e.g.
align.subset_msa_alignment("input.fasta", "ids.txt", "output.fasta")
where the first tuple (0,10) is a row-wise subset and
the second tuple (0,100) is a column-wise subset
microbiorust.blast — BLAST results
import microbiorust
results = microbiorust.parse_tabular("blast_results.tab")
for hit in results:
print(hit["qseqid"], hit["pident"], hit["bitscore"])
Choice of the usage pattern
| Goal | Use |
|---|---|
| Write everything to file | collection.write_faa() / write_ffn() / write_fna() |
| Get all proteins across a whole genome file | gbk_to_faa() |
| Work per genome contig record | parse_gbk() then record.faa() or record.ffn() |
| Features and sequences together | parse_gbk() then record.sequences() + record.features() |
| Count proteins without loading | gbk_to_faa_count() |
| Convert collection to JSON string | collection.to_json() |
| Parse JSON string into Python dictionary | json.loads() |
Why Rust?
Rust gives microbiorust C-level performance with memory safety — no segfaults, no GIL limitations, and no need for NumPy or Pandas for core parsing operations. Large GenBank or EMBL files are parsed significantly faster than equivalent pure-Python implementations.
Documentation
Full documentation: https://microbiorust.github.io/docs/
Source: https://github.com/microBioRust/microBioRust
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file microbiorust-0.1.6.tar.gz.
File metadata
- Download URL: microbiorust-0.1.6.tar.gz
- Upload date:
- Size: 34.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2135297e6dd81d620f0319dfebaf1583322ded4972c4f244eb35d3d0539831d1
|
|
| MD5 |
133c49d9e43002e925509bfe6884d534
|
|
| BLAKE2b-256 |
13c8f56a65338af745c28a48d6b958d97191f3cf317de1222a5f4a0673bb5b8e
|
File details
Details for the file microbiorust-0.1.6-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: microbiorust-0.1.6-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9b3a7db1a51452b577d2e74d2cf59f7899f9a1825e3a845a93c90be33be31cd
|
|
| MD5 |
15ad1576941ee5290d556704f6f46f7d
|
|
| BLAKE2b-256 |
cc0b58ca7ce97ec4c1ccc0c4abcbeed3c3b049420e0a5b09c3a317b614b3b519
|
File details
Details for the file microbiorust-0.1.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: microbiorust-0.1.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
365582e1199560061e497c0fb57e96711500b7db1cbe9fc7178a0b164ff681fb
|
|
| MD5 |
d3d556ef9e3eac01a99d8e77c3f051f6
|
|
| BLAKE2b-256 |
f6d09304d7b54c72868839c671424df9aeb351196b1f5b4c0c939caf4a2a6d1b
|
File details
Details for the file microbiorust-0.1.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: microbiorust-0.1.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69b5f6ab67a53b79f03637006f86a92cef18bc301414385ca8eb956d172d9d8
|
|
| MD5 |
0706c9949aa4115929d681f3eb72541f
|
|
| BLAKE2b-256 |
9b1301bb13d1070a8415ad0a0fedf8da3f7394e71be2b0a1743177509d5315a3
|
File details
Details for the file microbiorust-0.1.6-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: microbiorust-0.1.6-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb25b2e09beef676495f12a21ded28a1581b65da8200fc7f65604cba0ac72e8a
|
|
| MD5 |
e3e6f8aaf1edae829e677325762d7e41
|
|
| BLAKE2b-256 |
50ec44eea2bad753d769ace8f2367af09d3b2703ec5c30da9d98a1c31e73b2a8
|
File details
Details for the file microbiorust-0.1.6-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: microbiorust-0.1.6-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b403eaa74a907996b4b7b43dde83130775a8b2f13f7b9d997e86330af367c3a4
|
|
| MD5 |
77cccbb04bf8a08bf299fd4607433211
|
|
| BLAKE2b-256 |
3ea44325d7be12ce4c56d8ab6bd2e8b5908d583732d7f1025cd9e5ab31e793ce
|