Skip to main content

A lightweight and easy-to-use Python bioinformatics toolkit.

Project description

omiBio

- A Lightweight Bioinformatics Toolkit

Latest Version Python 3.11+ License: MIT flake8

Logo

Introduction

omiBio is a lightweight, user-friendly Python toolkit for bioinformatics — ideal for education, research, and rapid prototyping.

Key features:

  • Robust data structures: Sequence, Polypeptide, etc., with optional validation.
  • Simple I/O: Read/write bioinformatics files (e.g., FASTA) with one-liners.
  • Analysis tools: GC content, ORF detection, consensus sequences, sliding windows, and more.
  • CLI included: Run common tasks from the terminal .
  • Basic visualization: Built-in plotting (via matplotlib & seaborn) for quick insights.
  • Functional & OOP APIs: Use classes or convenient wrapper functions.

Modules Overview

The omiBio toolkit is organized into the following modules:

Module Purpose Key Classes / Functions
omibio.sequence Sequence-type data structures Sequence, Polypeptide
omibio.bio Biological objects and data containers SeqInterval, AnalysisResult
omibio.io File I/O for common bioinformatics formats read_fasta(), read_fastq()
omibio.analysis Sequence analysis functions gc(), sliding_gc(), find_orfs()
omibio.utils General-purpose utility functions truncate_repr()
omibio.viz Simple and easy-to-use data visualization plot_orf(), plot_sliding_gc()
omibio.cli Command-line interfaces for common workflows omibio random-fasta, omibio clean

Release Notes - omiBio [v0.1.4] 12/14/25

Performance & Core I/O

  • Optimized FASTA parsing
    Introduced the generator-based read_fasta_iter() to improve performance, refine error handling, and add a configurable warning system.
    The existing read_fasta() API remains unchanged for external use and continues to return SeqCollections, allowing users to choose between eager and lazy parsing.
    Both read_fasta() and read_fasta_iter() now accept TextIO and PathLike objects as data sources.

  • FASTQ support
    Added read_fastq() and write_fastq() with the same design philosophy as the FASTA APIs.
    A generator interface, read_fastq_iter(), is also provided.
    All FASTQ I/O functions support TextIO and PathLike inputs.

  • Flexible file writing
    All sequence writing functions can now return a list of formatted strings when no output file is specified.

CLI Improvements

  • Refactored and streamlined the CLI structure.
  • Improved existing commands and added new ones, including:
    • omibio fasta view
    • omibio fastq to-fasta
    • omibio kmer count
  • All CLI commands support stdin/stdout and can be composed in Unix-style pipelines.

API & Data Model Changes

  • Removed the Gene and Genome classes, which overlapped in functionality with SeqEntry and SeqCollections.
  • Made the Sequence and Polypeptide classes immutable.
  • Added the at_content() method to the Sequence class.
  • Applied __slots__ to SeqInterval and SeqEntry to reduce memory overhead.

Analysis & Visualization

  • Enhanced plot_kmer() to support k-mer heatmaps across multiple sequences.
  • Refactored AnalysisResult into an abstract base class.
  • Added concrete result types:
    • IntervalResult
    • KmerResult
  • Results returned by analysis functions (e.g. kmer()) can now be visualized directly via a unified .plot() interface.

Quality Assurance

  • Numerous minor fixes and internal refinements.
  • Comprehensive test coverage (≥ 95%).

Usage example

Creating a sliding window GC chart using omiBio:

# Load sequences from FASTA (returns dict[str, Sequence])
seqs: SeqCollections[SeqEntry] = read_fasta("examples/example.fasta")
dna: Sequence = seqs["example"]

# Compute GC content in sliding windows (window=200 bp, step=20 bp)
result: IntervalResult[SeqInterval] = sliding_gc(dna, window=200, step=20)

# Visualize easily
result.plot(show=True)  # or: plot_sliding_gc(result, show=True)

Or even a one-liner:

sliding_gc(read_fasta("examples/example.fasta")["example"]).plot(show=True)

The above code will produce results like this:

Example

Using omiBio's Command-line interfaces:

$ omibio orf find example.fasta --min-length 100

The above CLI will produce results like this:

seq_id       start   end     strand  frame   length
example_2    70      289     -       -2      219
example_16   53      257     +       +3      204
example_13   118     301     +       +2      183
example_4    92      272     -       -1      180
example_2    157     322     +       +2      165
example_5    17      173     -       -1      156
example_16   176     332     -       -1      156
...

Installation / 安装

From PyPI:

$ pip install omibio

Requirements

  • Python: >= 3.12
  • Core dependencies:
    • click (for CLI)
    • numpy & pandas → analysis/plotting dependencies
    • matplotlib & seaborn → enables visualization

For complete project build and dependency configuration, please refer to pyproject.toml

Code Style

omiBio follows PEP 8 conventions for Python code.
All code is automatically formatted and checked using flake8.

License

This project is licensed under the MIT License.

Things to note

  • Most of the code in this project uses 0-based indexes, half-open interval, rather than the 1-based indexes commonly used in biology.
  • All code type hints in this project use PEP 585 generic syntax in Python 3.9+.
  • This project is still under development and not yet ready for production. Please use it with caution. If you have any suggestions, please contact us:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omibio-0.1.4.post3.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omibio-0.1.4.post3-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file omibio-0.1.4.post3.tar.gz.

File metadata

  • Download URL: omibio-0.1.4.post3.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omibio-0.1.4.post3.tar.gz
Algorithm Hash digest
SHA256 823f1275bbad38104863b64f46b42e45cbbcf601ac40881d7335ddb038e3b633
MD5 4e38a28544109387fe008111b36e8a8b
BLAKE2b-256 d9bdd59e444fd1b310f78a397d9b7737f866984ace84201bf727b37d2dcaaa6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for omibio-0.1.4.post3.tar.gz:

Publisher: python-publish.yml on LK923/omiBioKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omibio-0.1.4.post3-py3-none-any.whl.

File metadata

  • Download URL: omibio-0.1.4.post3-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omibio-0.1.4.post3-py3-none-any.whl
Algorithm Hash digest
SHA256 561a28106390be1739fbf8a59881970a1a19dccda373eedce1a6c1def3c6c592
MD5 1ad5e6d634c42f5c91f48878f188ffcd
BLAKE2b-256 d8d31bf1653d18b02d89b14e02d7b053ea9ce62a2e6a357333e74e36a6edc5f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for omibio-0.1.4.post3-py3-none-any.whl:

Publisher: python-publish.yml on LK923/omiBioKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page