Skip to main content

slow5lib python bindings

Project description

pyslow5 python library

The slow5 python library (pyslow5) allows a user to read and write slow5/blow5 files.

Installation

Initial setup and example info for environment

slow5lib needs python3.4.2 or higher.

If you only want to use the python library, then you can simply install using pip

Using a virtual environment (see below if you need to install python)

Optional zstd compression

You can optionally enable zstd compression support when building slow5lib/pyslow5. This requires zstd 1.3 or higher development libraries installed on your system:

On Debian/Ubuntu : sudo apt-get libzstd1-dev
On Fedora/CentOS : sudo yum libzstd-devel
On OS X : brew install zstd

BLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. However, zlib runtime library is available by default on almost all distributions unlike zstd and thus files compressed with zlib will be more 'portable'.

Install from pypi

python3 -m venv path/to/slow5libvenv
source path/to/slow5libvenv/bin/activate
python3 -m pip install --upgrade pip

# do this separately, after the libs above
# zlib only build
python3 -m pip install pyslow5

# for zstd build, run the following
export PYSLOW5_ZSTD=1
python3 -m pip install pyslow5

Dev install

# If your native python3 meets this requirement, you can use that, or use a
# specific version installed with deadsnakes below. If you install with deadsnakes,
# you will need to call that specific python, such as python3.8 or python3.9,
# in all the following commands until you create a virtual environment with venv.
# Then once activated, you can just use python3.

# To install a specific version of python, the deadsnakes ppa is a good place to start
# This is an example for installing python3.8
# you can then call that specific python version
# > python3.8 -m pip --version
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt install python3.8 python3.8-dev python3.8-venv


# get zlib1g-dev
sudo apt-get update && sudo apt-get install -y zlib1g-dev

# Check with
python3 --version

# You will also need the python headers if you don't already have them installed.

sudo apt-get install python3-dev

Building and installing the python library.

python3 -m venv /path/to/slow5libvenv
source /path/to/slow5libvenv/bin/activate
python3 -m pip install --upgrade pip

git clone git@github.com:hasindu2008/slow5lib.git
cd slow5lib

# New build method to work with setuptools deprication
python3 -m pip install .

# This should not require sudo if using a python virtual environment/venv
# confirm installation, and find pyslow5==<version>
python3 -m pip freeze

# Ensure slow5 library is working by running the basic tests
python3 ./python/example.py


# To Remove the library
python3 -m pip uninstall pyslow5



# Legacy build methods - not recommended
# CHOOSE A OR B:
# (B is the cleanest method)
# |=======================================================================|
# |A. Install with pip if wheel is present, otherwise it uses setuptools  |
    python3 -m pip install . --use-feature=in-tree-build
# |=======================================================================|
# |B. Or build and install manually with setup.py                         |
# |build the package                                                      |
    python3 setup.py build
# |If all went well, install the package                                  |
    python3 setup.py install
# |=======================================================================|

Usage

Reading/writing a file

Open(FILE, mode, rec_press="zlib", sig_press="svb-zd", DEBUG=0):

The pyslow5 library has one main Class, pyslow5.Open which opens a slow5/blow5 (slow5 for easy reference) file for reading/writing.

FILE: the file or filepath of the slow5 file to open mode: mode in which to open the file.

  • r= read only
  • w= write/overwrite
  • a= append

This is designed to mimic Python's native Open() to help users remember the syntax

To set the record and signal compression methods, use the following rec_press and sig_press optional args, however these are only used with mode='w'. Any append will use whatever is already set in the file.

Compression Options:

rec_press:

  • "none"
  • "zlib" [default]
  • "zstd" [requires export PYSLOW5_ZSTD=1 when building]

sig_press:

  • "none"
  • "svb-zd" [default]
  • "ex-zd" [best compression, available from v1.3.0]

Example:

import pyslow5

# open file
s5 = pyslow5.Open('examples/example.slow5','r')

When opening a slow5 file for the first time, and index will be created and saved in the same directory as the file being read. This index will then be loaded. For files that already have an index, that index will be loaded.

get_read_ids():

returns a list and total number of reads from the index. If there is no index, it creates one first.

Example:

read_ids, num_reads = s5.get_read_ids()

print(read_ids)
print("number of reads: {}".format(num_reads))

seq_reads(pA=False, aux=None):

Access all reads sequentially in an opened slow5.

  • If readID is not found, None is returned.
  • pA = Bool for converting signal to picoamps.
  • aux = str '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, None if <attr_name> not found
  • returns dict = dictionary of main fields for read_id, with any aux fields added

Example:

# create generator
reads = s5.seq_reads()

# print all readIDs
for read in reads:
    print(read['read_id'])

# or use directly in a for loop
for read in s5.seq_reads(pA=True, aux='all'):
    print("read_id:", read['read_id'])
    print("read_group:", read['read_group'])
    print("digitisation:", read['digitisation'])
    print("offset:", read['offset'])
    print("range:", read['range'])
    print("sampling_rate:", read['sampling_rate'])
    print("len_raw_signal:", read['len_raw_signal'])
    print("signal:", read['signal'][:10])
    print("================================")

seq_reads_multi(threads=4, batchsize=4096, pA=False, aux=None):

Access all reads sequentially in an opened slow5, using multiple threads.

  • If readID is not found, None is returned.
  • threads = number of threads to use in C backend.
  • batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.
  • pA = Bool for converting signal to picoamps.
  • aux = str '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, None if <attr_name> not found
  • returns dict = dictionary of main fields for read_id, with any aux fields added

Example:

# create generator
reads = s5.seq_reads_multi(threads=2, batchsize=3)

# print all readIDs
for read in reads:
    print(read['read_id'])

# or use directly in a for loop
for read in s5.seq_reads_multi(threads=2, batchsize=3, pA=True, aux='all'):
    print("read_id:", read['read_id'])
    print("read_group:", read['read_group'])
    print("digitisation:", read['digitisation'])
    print("offset:", read['offset'])
    print("range:", read['range'])
    print("sampling_rate:", read['sampling_rate'])
    print("len_raw_signal:", read['len_raw_signal'])
    print("signal:", read['signal'][:10])
    print("================================")

get_read(readID, pA=False, aux=None):

Access a specific read using a unique readID. This is a ranom access method, using the index.

  • If readID is not found, None is returned.
  • pA = Bool for converting signal to picoamps.
  • aux = str '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, None if <attr_name> not found
  • returns dict = dictionary of main fields for read_id, with any aux fields added

Example:

readID = "r1"
read = s5.get_read(readID, pA=True, aux=["read_number", "start_mux"])
if read is not None:
    print("read_id:", read['read_id'])
    print("len_raw_signal:", read['len_raw_signal'])

get_read_list(read_list, pA=False, aux=None):

Access a list of specific reads using a list read_list of unique readIDs. This is a random access method using the index. If an index does not exist, it will create one first.

  • If readID is not found, None is returned.
  • pA = Bool for converting signal to picoamps.
  • aux = str '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, None if <attr_name> not found
  • returns dict = dictionary of main fields for read_id, with any aux fields added

Example:

read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list(read_list)
for r, read in zip(read_list,selected_reads):
    if read is not None:
        print(r, read['read_id'])
    else:
        print(r, "read not found")

get_read_list_multi(read_list, threads=4, batchsize=100, pA=False, aux=None)::

Access a list of specific reads using a list read_list of unique readIDs using multiple threads. This is a random access method using the index. If an index does not exist, it will create one first.

  • If readID is not found, None is returned.
  • threads = number of threads to use in C backend
  • batchsize = number of reads to fetch at a time. Higher numbers use more ram, but is more efficient with more threads.
  • pA = Bool for converting signal to picoamps.
  • aux = str '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, None if <attr_name> not found
  • returns dict = dictionary of main fields for read_id, with any aux fields added Example:
read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list_multi(read_list, threads=2, batchsize=3)
for r, read in zip(read_list, selected_reads):
    if read is not None:
        print(r, read['read_id'])
    else:
        print(r, "read not found")

get_num_read_groups():

NEW: from version 1.1.0+

Return an int for the number of read_groups present in file

get_header_names():

Returns a list containing the uninon of header names from all read_groups

get_header_value(attr, read_group=0):

Returns a str of the value of a header attribute (attr) for a particular read_group. Returns None if value can't be found

get_all_headers(read_group=0):

Returns a dictionary with all header attributes and values for a particular read_group If there are values present for one read_group, and not for another, the attribute will still be returned for the read_group without, but with a value of None.

get_aux_names():

Returns an ordered list of auxiliary attribute names. (same order as get_aux_types())

This is used for understanding which auxiliary attributes are available within the slow5 file, and providing selections to the aux keyword argument in the above functoions

get_aux_types():

Returns an ordered list of auxiliary attribute types (same order as get_aux_names())

This can mostly be ignored, but will be used in error tracing in the future, as auxiliary field requests have multiple types, each with their own calls, and not all are used. It could be the case a call for an auxiliary filed fails, and knowing which type the field is requesting is very helpful in understanding which function in C is being called, that could be causing the error.

get_aux_enum_labels(label):

Returns an ordered list representing the values in the enum struct in the type header.

The value in the read can then be used to access the labels as an index to the list.

Example:

s5 = slow5.Open(file,'w')
end_reason_labels = s5.get_aux_enum_labels('end_reason')
print(end_reason_labels)

> ['unknown', 'partial', 'mux_change', 'unblock_mux_change', 'signal_positive', 'signal_negative']
# or from newer datsets
> ["unknown", "mux_change", "unblock_mux_change", "data_service_unblock_mux_change", "signal_positive", "signal_negative", "api_request", "device_data_error", "analysis_config_change", "paused"]

readID = "r1"
read = s5.get_read(readID, aux='all')
er_index = read['end_reason']
er = end_reason_labels[er_index]

print("{}: {}".format(er_index, er))

> 4: signal_positive

Writing a file

To write a file, mode in Open() must be set to 'w' and when appending, 'a'

get_empty_header(aux=False):

Returns a dictionary containing all known header attributes with their values set to None.

User can modify each value, and add or remove attributes to be used has header items. All values end up stored as strings, and anything left as None will be skipped. To write header, see write_header()

If aux=True, an ordered list of strings for the enum end_reason will be returned. This can be modified depending on the end reason.

Example:

s5 = slow5.Open(file,'w')
header = s5.get_empty_header()

end_reason enum example

s5 = slow5.Open(file, w)
header, end_reason_labels = s5.get_empty_header(aux=True)

write_header(header, read_group=0, end_reason_labels=None):

Write header to file

  • header = populated dictionary from get_empty_header()
  • read_group = read group integer for when multiple runs are written to the same slow5 file
  • end_reason_labels = ordered list used for end_reason enum
  • returns 0 on success, <0 on error with error code

You must write read_group=0 (default) first before writing any other read_groups, and it is advised to write read_groups in sequential order.

Example:

# Get some empty headers
header = s5.get_empty_header()
header2 = s5.get_empty_header()

# Populate headers with some test data
counter = 0
for i in header:
    header[i] = "test_{}".format(counter)
    counter += 1

for i in header2:
    header2[i] = "test_{}".format(counter)
    counter += 1

# Write first read group
ret = s5.write_header(header)
print("ret: write_header(): {}".format(ret))
# Write second read group, etc
ret = s5.write_header(header2, read_group=1)
print("ret: write_header(): {}".format(ret))

end_reason example:

# Get some empty headers
header, end_reason_labels = s5.get_empty_header(aux=True)

# Populate headers with some test data
counter = 0
for i in header:
    header[i] = "test_{}".format(counter)
    counter += 1

# Write first read group
ret = s5.write_header(header, end_reason_labels=end_reason_labels)
print("ret: write_header(): {}".format(ret))

get_empty_record(aux=False):

Get empty read record for populating with data. Use with write_record()

  • aux = Bool for returning empty aux dictionary as well as read dictionary
  • returns a single read dictionary or a read and aux dictionary depending on aux flag

Example:

# open some file to read. We will copy the data then write it
# including aux fields
s5_read = slow5.Open(read_file,'r')
reads = s5_read.seq_reads(aux='all')

# For each read in s5_read...
for read in reads:
    # get an empty record and aux dictionary
    record, aux = s5.get_empty_record(aux=True)
    # for each field in read...
    for i in read:
        # if the field is in the record dictionary...
        if i in record:
            # copy the value over...
            record[i] = read[i]
        do same for aux dictionary
        if i in aux:
            aux[i] = read[i]
    # write the record
    ret = s5.write_record(record, aux)
    print("ret: write_record(): {}".format(ret))

write_record(record, aux=None):

Write a record and optional aux fields.

  • record = a populated dictionary from get_empty_record()
  • aux = an empty aux record returned by get_empty_record(aux=True)
  • returns 0 on success and -1 on error/failure

Example:

record, aux = s5.get_empty_record(aux=True)
# populate record, aux dictionaries
#....
# Write record
ret = s5.write_record(record, aux)
print("ret: write_record(): {}".format(ret))

write_record_batch(records, threads=4, batchsize=4096, aux=None):

Write a record and optional aux fields, using multiple threads

  • records = a dictionary of dictionaries where each entry is a populated form of get_empty_record() with the key of each being the read['read_id'].
  • threads = number of threads to use in the C backend.
  • batchsize = number of reads to write at a time. If parsing 1000 records, with batchsize=250 and threads=4, 4 threads will be spawned 4 times to write 250 records to the file before returning
  • aux = an empty aux record returned by get_empty_record(aux=True)
  • returns 0 on success and -1 on error/failure

Example:

record, aux = s5.get_empty_record(aux=True)
# populate record, aux
#....
records[record['read_id']] = record
auxs[record['read_id']] = aux
# Write record
ret = s5.write_record_batch(records, threads=2, batchsize=3, aux=auxs)
print("ret: write_record(): {}".format(ret))

close():

Closes a record open for writing or appending, and writes an End Of File (EOF) flag.

If not explicitly closed, when the s5 object goes out of context in python, it will also trigger a close to attempt to avoid having a missing EOF.

Please call this when you are finished writing a file.

Example:

s5 = slow5.Open(file,'w')

# do some writing....

# Write's EOF and closes file
s5.close()

Citation

Please cite the following in your publications when using slow5lib/pyslow5:

Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4

@article{gamaarachchi2022fast,
  title={Fast nanopore sequencing data analysis with SLOW5},
  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
  journal={Nature biotechnology},
  pages={1--4},
  year={2022},
  publisher={Nature Publishing Group}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyslow5-1.4.0.tar.gz (656.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyslow5-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

pyslow5-1.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file pyslow5-1.4.0.tar.gz.

File metadata

  • Download URL: pyslow5-1.4.0.tar.gz
  • Upload date:
  • Size: 656.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.9

File hashes

Hashes for pyslow5-1.4.0.tar.gz
Algorithm Hash digest
SHA256 d49a3ed86eb0f4473accc7b2003a27e1adc496b5abb61332fa03e9c7aa72dd5b
MD5 579009c853941e3e9cb2677371d87842
BLAKE2b-256 3787de9484abc4b1433552bd9d8674a2ce7dcca2041a140540688ffa5de68c0e

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f6b72d90982077a6d48c4e66933706e230619179e9cb4157c40f2ad1e40d7a07
MD5 3fc2da4e66b6f10f8110ba3012cea938
BLAKE2b-256 522c5e9cca9ea047749f853575bdb45d1e1c80199e3690c51c31de80c1c5b991

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e0587d8dab4a484a1a3e8539d07200a15b3cd7d822e8481f74548fb234c56ae8
MD5 8f43e03ebdba804ab429b95856f56b1d
BLAKE2b-256 c71cb07d6a03075398523e5f765c375ae0e79434f0253f7e8121651c754be58b

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 8a458ffcc98d675371e2187abb4e2c80e32a906827cf38ce2b5c388b935cbd76
MD5 6ca84d281414ead2562210bd58c13eef
BLAKE2b-256 53f12d9a0c10997673c6538b890f35108d0eb7fdb743b4541dae35329fdb5bb4

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aa394e4cee36febd14da05ce928c3456724c5eb9a9c63b70368b5ace9223cd78
MD5 17d3c3411c80f098ea77a5b0e5b56479
BLAKE2b-256 6bd8a295f075474d91507e76a742ccb901c6fab74b42032c1674e2db04e4ec85

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0050fdea9df8181add12c3bb9fdb908d0fd981784c23bfaaf3ad72193607909d
MD5 1f0d276cd748762f51df6d98570035dc
BLAKE2b-256 41118fa387c0bcf60829836d494040d7cd2c8ee5704522a473a502b478af2d70

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a3fc2c4eb43da5a2cbff3f29669a7849c306661561a80f96829c369936b7264
MD5 70f4f27a0f8d0697dbd003533c27a130
BLAKE2b-256 f4a900f2f86c80fcbfeb87c32a7e73ce9b3492553938f484e4fee40f18a3c42a

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d44db2dbee612d21918e84da701ffe0e4ef5b60cbe0e010628d179a22b99f70
MD5 f4c0a9cac620a0d560a3793e536a4b2e
BLAKE2b-256 6867127112153b5170ac7a4937d0648820f4f3b1e7dab0fb5d3b4c6928a54af3

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f53ae4ef6ede84215e83ff6dcbb58c4abe13b6353ddd119e962861136ad77ac8
MD5 ed7f7f00aeb80316e37b5e81db74f673
BLAKE2b-256 4b906bd223cbbb3ee7622caf6bd8c35697cfe6a74720dd5eb3d745f9ee799c87

See more details on using hashes here.

File details

Details for the file pyslow5-1.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyslow5-1.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5de53f621b62156409188d28235ff16f9d7bb963b42590a818263ea22853d05c
MD5 16b512c788f78a581f1e8fbcdcbac94e
BLAKE2b-256 892c24af9976e5705d2256b774e798ea778ac050c580f819720792e07fffc53c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page