Skip to main content

blue-crab: A Slow5/Blow5 <-> Pod5 converter

Project description

blue-crab

blue-crab is a conversion tool to convert from ONT's POD5 format to the community maintained SLOW5/BLOW5 format. Maybe one day ONT will see the light and realise column-based file formats for row-based reading is a bad idea. Till then, Crab go snap snap! Happy converting!

SLOW5 specification: https://hasindu2008.github.io/slow5specs
slow5tools: https://github.com/hasindu2008/slow5tools
pyslow5: https://hasindu2008.github.io/slow5lib/pyslow5_api/pyslow5.html

Snake CI

WARNING

While we test as much as we can and do our very best to ensure 100% data parity, we have no control over what ONT will do to pod5.

Given their history of ad-hoc changes, there is bound to be cases in the future where this breaks the conversion.

You may use commands like slow5tools quickcheck and index to verify the integrity of the created S/BLOW5 files.

Quickstart

python3 -m venv ./blue-crab-venv
source ./blue-crab-venv/bin/activate
python3 -m pip install --upgrade pip

pip install blue-crab

blue-crab --help

Setup

blue-crab requires python 3.8 or higher (limitation due to ONT's pod5 library). Using a virtual environment is recommended.

  1. Install zlib development libraries (and optionally zstd development libraries).

    The commands to zlib development libraries on some popular distributions :

    On Debian/Ubuntu : sudo apt-get install zlib1g-dev
    On Fedora/CentOS : sudo dnf/yum install zlib-devel
    On OS X : brew install zlib
    

    SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. However, zlib runtime library is available by default on almost all distributions unlike zstd and thus files compressed with zlib will be more 'portable'. Enabling optional zstd support, requires zstd 1.3 or higher development libraries installed on your system:

    On Debian/Ubuntu : sudo apt-get install libzstd1-dev # libzstd-dev on newer distributions if libzstd1-dev is unavailable
    On Fedora/CentOS : sudo yum libzstd-devel
    On OS X : brew install zstd
    

pick option 2 or 3

  1. Create a virtual environment using Python 3.8+ and install blue-crab from pip

    python3 -m venv ./blue-crab-venv
    source ./blue-crab-venv/bin/activate
    python3 -m pip install --upgrade pip
    
    # only if you want zstd support and have installed zstd development libraries for zstd build
    export PYSLOW5_ZSTD=1
    
    pip install blue-crab
    
    blue-crab --help
    
  2. Create a virtual environment using Python 3.8+ and install blue-crab from source

    # clone the repo
    git clone  https://github.com/Psy-Fer/blue-crab && cd blue-crab
    
    # create venv
    python3 -m venv ./blue-crab-venv
    source ./blue-crab-venv/bin/activate
    python3 -m pip install --upgrade pip
    
    # only if you want zstd support and have installed zstd development libraries for zstd build
    export PYSLOW5_ZSTD=1
    
    # install blue-crab
    python3 -m pip install .
    blue-crab --help
    

    You can check your Python version by invoking python3 --version. If your native python3 meets this requirement of >=3.8, you can use that, or use a specific version installed with deadsnakes below. If you install with deadsnakes, you will need to call that specific python, such as python3.8 or python3.9, in all the following commands until you create a virtual environment with venv. Then once activated, you can just use python3. To install a specific version of python, the deadsnakes ppa is a good place to start:

    # This is an example for installing python3.8
    # you can then call that specific python version
    # > python3.8 -m pip --version
    sudo add-apt-repository ppa:deadsnakes/ppa
    sudo apt-get update
    sudo apt install python3.8 python3.8-dev python3.8-venv
    

Usage

Please visit the manual page for all the commands and options. Some examples are give below:

# pod5 file -> slow5/blow5 file
blue-crab p2s example.pod5 -o example.blow5

# pod5 directory -> slow5/blow5 directory
blue-crab p2s pod5_dir -d blow5_dir

# slow5/blow5 -> pod5
blue-crab s2p example.blow5 -o example.pod5

Note that default compression is zlib for maximise compatibility. SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. If you installed blue-crab with zstd support, you can create zstd compressed BLOW5 as:

# pod5 -> zstd compressed slow5/blow5
blue-crab p2s -c zstd pod5_dir -d blow5_dir

Notes

POD5 has had a number of backward compatibility-breaking changes so far. This version of blue-crab is only tested on most recent pod5 files. blue-crab simply relies on ONT's POD5 API for reading and writing POD5 files, thus, leaving the burden of managing a library that can handle all the variants of POD5 and cleaning up the mess they create. We will not invest time to handle all these various idiosyncrasies in POD5, unlike we did for hundreds of different FAST5 formats when developing slow5tools. If your POD5 files are v0.1.5 or lower, you may check this old readme out.

Example comparison

The following table compares an original 5khz pod5 file from the public zymo dataset (link below), containing 10k reads. Pod5 is using its default VBZ compression which is a mix of zstd and svb-zd for the signal.

The blow5 files are conversions made using blue-crab and timed with /usr/bin/time -v <cmd>. They were carried out on an XPS 15 laptop with a modern SSD hard drive. They all have signal compression set to use svb-zd. Using python3.11.3.

The table shows pod5-vbz is slightly smaller than both blow5-zstd and blow5-zlib. We prefer to default to blow5-zlib as it is more portable as zlib comes with most systems (as discussed above). If you want the best compression and faster conversion times however, blow5-zstd is the clear winner for blow5.

method size (mb) time (s)
pod5-vbz 679 -
blow5-zstd 681 3.91
blow5-zlib 689 7.86
- - -
blow5-xxx 666 -

I have included an example blow5-xxx to show that we can make the files even smaller than pod5, and this work is under active development. However those compression techniques are currently not available in blue-crab.

Acknowledgement

George Bouras for providing some example becterial pod5 files. Rasmus Kirkegaard for this public zymo pod5 dataset. George from ONT for help in understanding pod5 stuff.

Citation

Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4

@article{gamaarachchi2022fast,
  title={Fast nanopore sequencing data analysis with SLOW5},
  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
  journal={Nature biotechnology},
  pages={1--4},
  year={2022},
  publisher={Nature Publishing Group}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blue-crab-0.1.0.tar.gz (44.3 kB view details)

Uploaded Source

File details

Details for the file blue-crab-0.1.0.tar.gz.

File metadata

  • Download URL: blue-crab-0.1.0.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for blue-crab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0b9685320e9bc937952a55e711136d5bf23082b36f00325665ce789b6e15980b
MD5 207dd9809003d4ee779951925b8bc365
BLAKE2b-256 3a95699c1a6ec9d07246e3a1c28e4ae4faa1924915c23f97c0e7c2eb8d0e3e75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page