Skip to main content
Join the official Python Developers Survey 2018 and win valuable prizes: Start the survey!

Oxford Nanopore Technologies fast5 API software

Project description

API for interacting with Oxford Nanopore Technologies fast5 files

ont_fast5_api is a simple interface to HDF5 files of the Oxford Nanopore fast5 file format.

It provides:

  • Concrete implementation of the fast5 file schema using the generic h5py library
  • Plain-english-named methods to interact with and reflect the fast5 file schema
  • Tools to convert between multi_read and single_read formats

Getting Started

The ont_fast5_api is available on PyPI and can be installed via pip:

pip install ont-fast5-api

Alternatively, it is available on github where it can be built from source:

git clone https://github.com/nanoporetech/ont_fast5_api
cd ont_fast5_api
python setup.py install

Dependencies

ont_fast5_api is a pure python project and should run on most python versions and operating systems.

It requires:

Interface - Console Scripts

The ont_fast5_api provides terminal/command-line console_scripts for converting between files in the Oxford Nanopore single_read and multi_read fast5 formats. These are provided to ensure compatibility between tools which expect either the single_read or multi_read fast5 file formats.

The scripts are added during installation and can be called from the terminal/command-line or from within python.

single_to_multi_fast5

This script converts folders containing single_read_fast5 files into multi_read_fast5_files:

single_to_multi_fast5
    -i, --input_path <(path) folder containing single_read_fast5 files>
    -s, --save_path <(path) to folder where multi_read fast5 files will be output>
    [optional] -f, --filename_base <(string) name for new multi_read file; default="batch" (see note-1)>
    [optional] -n, --batch_size <(int) number of single_reads to include in each multi_read file; default=4000>
    [optional] --recursive <(bool) if included, rescursively search sub-directories for single_read files; default=False>

note-1: newly created multi_read files require a name. This is the filename_base with the batch count and .fast5 appended to it; e.g. -f batch yields batch_0.fast5, batch_1.fast5, ...

example usage:

single_to_multi_fast5 --input_path /data/reads --save_path /data/multi_reads
    --filename_base batch_output --batch_size 100 --recursive

Where /data/reads and/or its subfolders contain single_read fast5 files. The output will be multi_read fast5 files each containing 100 reads, in the folder: /data/multi_reads with the names: batch_output_0.fast5, batch_output_1.fast5 etc.

multi_to_single_fast5

This script converts folders containing multi_read_fast5 files into single_read_fast5 files:

multi_to_single_fast5
    -i, --input_path <(path) folder containing multi_read_fast5 files>
    -s, --save_path <(path) to folder where single_read fast5 files will be output>
    [optional] -n, --batch_size <(int) number of single_reads to include in each output folder; default=4000 (see note-2)>
    [optional] --recursive <(bool) if included, rescursively search sub-directories for multi_read files; default=False>

note-2: single_read fast5 files are batched into subdirectories for output for performance reasons

example usage:

multi_to_single_fast5 --input_path /data/multi_reads --save_path /data/single_reads
    --batch_size 100 --recursive

Where /data/multi_reads and/or its subfolders contain multi_read fast5 files. The output will be single_read fast5 files, in subfolders of the output_folder: /data/single_reads, with each folder containing 100 fast5 files.

Glossary of Terms:

HDF5 file format - a portable file format for storing and managing data. It is designed for flexible and efficient I/O and for high volume and complex data Fast5 - an implementation of the HDF5 file format, with specific data schemas for Oxford Nanopore sequencing data Single read fast5 - A fast5 file containing all the data pertaining to a single Oxford Nanopore read. This may include raw signal data, run metadata, fastq-basecalls and any other additional analyses Multi read fast5 - A fast5 file containing data pertaining to a multiple Oxford Nanopore reads. Pip - The python software package manager for installing python applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ont_fast5_api-1.0.1-py2.py3-none-any.whl (46.3 kB) Copy SHA256 hash SHA256 Wheel py2.py3 Sep 26, 2018
ont-fast5-api-1.0.1.tar.gz (30.3 kB) Copy SHA256 hash SHA256 Source None Sep 26, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page