Skip to main content

Extraction of modified base data from Guppy Fast5 output

Project description

Oxford Nanopore Technologies logo

Fast5Mod

install with bioconda

Fast5mod is a set of two programs for converting Guppy's modified base Fast5 output into:

  • An aligned or unaligned BAM formatted file, and
  • Aggregate modified base calls.

The functionality was originally part of Medaka, but has be removed to this distinct project.

© 2020 Oxford Nanopore Technologies Ltd.

Installation

Fast5Mod can be installed in one of several ways.

Installation with conda

Perhaps the simplest way to start using fast5mod on both Linux and MacOS is through conda; fast5mod is available via the bioconda channel:

conda create -n fast5mod -c conda-forge -c bioconda fast5mod

Installation with pip

For those who prefer python's native pacakage manager, fast5mod is also available on pypi and can be installed using pip:

pip install fast5mod

We recommend using fast5mod within a virtual environment, viz.:

virtualenv fast5mod --python=python3 --prompt "(fast5mod) "
. fast5mod/bin/activate
pip install fast5mod

Usage

The basic workflow for aggregating Guppy basecalling results for Dcm, Dam, and CpG methylation is shown below.

Aggregating the information from Guppy outputs is a two stage process, first the basecalling results are extracted .fast5 files and placed in a .bam file:

FAST5PATH=guppy/workspace
REFERENCE=grch38.fasta
OUTBAM=meth.bam
fast5mod guppy2sam ${FAST5PATH} ${REFERENCE} \
    --workers 74 --recursive \
    | samtools sort -@ 8 | samtools view -b -@ 8 > ${OUTBAM}
samtools sort ${OUTBAM}

This program will extract both the basecall sequence and methylation scores, align the basecall to the reference, and store results in a standard format. In this preliminary workflow the methylation scores are stored in two SAM tags, 'MC' and 'MA', one each for 5mC and 6mA respectively. The tags are 8bit integer array-values, one value per basecall position. This is a different form to that proposed in the current hts-specs proposition, but allows for more trivial parsing.

The second step is to aggregate the reference-aligned information to produce a simple tabular summary of read methylation counts:

BAM=meth.bam
REFERENCE=grch38.fasta
REGION=chr20:500000-1000000
OUTPUT=meth.tsv
fast5mod call --meth cpg ${BAM} ${REFERENCE} ${REGION} ${OUTPUT}

Here the option --meth cpg indicates that loci containing the sequence motif CG should be examined for 5mC presence. Other choices are dcm for which the motifs CCAGG and CCTGG are examined for 5mC and dam (GATC) for 6mA.

The output file is a simple tab-delimited text file with columns: 'ref.name', 'position', 'motif', 'fwd.meth.count', 'rev.meth.count', 'fwd.canon.count', and 'rev.canon.count'. Here fwd./ref. indicate counts on the two DNA strands and meth./canon. indicate counts for methylated and canonical bases. Note that the position field records the position of the first base in the motif recorded.

Help

Licence and Copyright

© 2020 Oxford Nanopore Technologies Ltd.

fast5mod is distributed under the terms of the Mozilla Public License 2.0.

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast5mod-1.0.5.tar.gz (56.7 kB view details)

Uploaded Source

Built Distributions

fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl (27.3 kB view details)

Uploaded CPython 3.8

fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl (26.9 kB view details)

Uploaded CPython 3.7m

fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl (26.9 kB view details)

Uploaded CPython 3.6m

fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl (26.9 kB view details)

Uploaded CPython 3.5m

File details

Details for the file fast5mod-1.0.5.tar.gz.

File metadata

  • Download URL: fast5mod-1.0.5.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for fast5mod-1.0.5.tar.gz
Algorithm Hash digest
SHA256 234bd89651079e2477b404eaf68dde5d3a4bed5afb4063d96fff43421c1b4948
MD5 372d21a34adcfd67a881eba21081e12e
BLAKE2b-256 f282c93398edf5147bd7a81e4775cf84c481b7fb86468f56776442fdaf1c91a4

See more details on using hashes here.

File details

Details for the file fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2491ebe826a31e6ec7792a42977d34b39858a14a4a0c3f5ef69a6d4dc67b5e51
MD5 736d9dcfbc0a20e3306ee37476770f73
BLAKE2b-256 4321cf7051bf30f1b47f31e4745136f5af5e84a6bc741cb5b44b2695ef963492

See more details on using hashes here.

File details

Details for the file fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ae6da61d25f25d9324238e0a18d287f6aad00e8d9a92dfccfa3c1f39687b6b0f
MD5 e34fb3bd17602c25c96832b8f1b9d638
BLAKE2b-256 d00d3c14e1b7d5e5b0668c46d5b567be07a00b9045400e975e9a10392cace371

See more details on using hashes here.

File details

Details for the file fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 575ce86a03336c7630f76e7829f704b0c998d2c43be803cd53644091225dd176
MD5 e01a965f51d3fa3073a5aa88f64e1bb1
BLAKE2b-256 b6e210e19c4daa0609f527d74f49bdfb63ac8e05e5bc4ec1e87713cf8385eef4

See more details on using hashes here.

File details

Details for the file fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9

File hashes

Hashes for fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 648c1a7034688b7f823e18f8c3ffd3f8d12a94ff05d05fd301b2869582dc018d
MD5 675eff51f90534aeab18455bae4c2fa4
BLAKE2b-256 04737b0d28686b2358deeac5299391c01e2aea67b4aa3098da6100422032d7de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page