Extraction of modified base data from Guppy Fast5 output
Project description
Fast5Mod
Fast5mod is a set of two programs for converting Guppy's modified base Fast5 output into:
- An aligned or unaligned BAM formatted file, and
- Aggregate modified base calls.
The functionality was originally part of Medaka, but has be removed to this distinct project.
© 2020 Oxford Nanopore Technologies Ltd.
Installation
Fast5Mod can be installed in one of several ways.
Installation with conda
Perhaps the simplest way to start using fast5mod on both Linux and MacOS is through conda; fast5mod is available via the bioconda channel:
conda create -n fast5mod -c conda-forge -c bioconda fast5mod
Installation with pip
For those who prefer python's native pacakage manager, fast5mod is also available on pypi and can be installed using pip:
pip install fast5mod
We recommend using fast5mod within a virtual environment, viz.:
virtualenv fast5mod --python=python3 --prompt "(fast5mod) "
. fast5mod/bin/activate
pip install fast5mod
Usage
The basic workflow for aggregating Guppy basecalling results for Dcm, Dam, and CpG methylation is shown below.
Aggregating the information from Guppy outputs is a two stage process, first
the basecalling results are extracted .fast5
files and placed in a .bam
file:
FAST5PATH=guppy/workspace
REFERENCE=grch38.fasta
OUTBAM=meth.bam
fast5mod guppy2sam ${FAST5PATH} ${REFERENCE} \
--workers 74 --recursive \
| samtools sort -@ 8 | samtools view -b -@ 8 > ${OUTBAM}
samtools sort ${OUTBAM}
This program will extract both the basecall sequence and methylation scores, align the basecall to the reference, and store results in a standard format. In this preliminary workflow the methylation scores are stored in two SAM tags, 'MC' and 'MA', one each for 5mC and 6mA respectively. The tags are 8bit integer array-values, one value per basecall position. This is a different form to that proposed in the current hts-specs proposition, but allows for more trivial parsing.
The second step is to aggregate the reference-aligned information to produce a simple tabular summary of read methylation counts:
BAM=meth.bam
REFERENCE=grch38.fasta
REGION=chr20:500000-1000000
OUTPUT=meth.tsv
fast5mod call --meth cpg ${BAM} ${REFERENCE} ${REGION} ${OUTPUT}
Here the option --meth cpg
indicates that loci containing the sequence
motif CG
should be examined for 5mC presence. Other choices are
dcm
for which the motifs CCAGG
and CCTGG
are examined for 5mC and dam
(GATC
) for 6mA.
The output file is a simple tab-delimited text file with columns: 'ref.name', 'position', 'motif', 'fwd.meth.count', 'rev.meth.count', 'fwd.canon.count', and 'rev.canon.count'. Here fwd./ref. indicate counts on the two DNA strands and meth./canon. indicate counts for methylated and canonical bases. Note that the position field records the position of the first base in the motif recorded.
Help
Licence and Copyright
© 2020 Oxford Nanopore Technologies Ltd.
fast5mod
is distributed under the terms of the Mozilla Public License 2.0.
Research Release
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file fast5mod-1.0.5.tar.gz
.
File metadata
- Download URL: fast5mod-1.0.5.tar.gz
- Upload date:
- Size: 56.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 234bd89651079e2477b404eaf68dde5d3a4bed5afb4063d96fff43421c1b4948 |
|
MD5 | 372d21a34adcfd67a881eba21081e12e |
|
BLAKE2b-256 | f282c93398edf5147bd7a81e4775cf84c481b7fb86468f56776442fdaf1c91a4 |
File details
Details for the file fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl
.
File metadata
- Download URL: fast5mod-1.0.5-cp38-cp38-manylinux1_x86_64.whl
- Upload date:
- Size: 27.3 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2491ebe826a31e6ec7792a42977d34b39858a14a4a0c3f5ef69a6d4dc67b5e51 |
|
MD5 | 736d9dcfbc0a20e3306ee37476770f73 |
|
BLAKE2b-256 | 4321cf7051bf30f1b47f31e4745136f5af5e84a6bc741cb5b44b2695ef963492 |
File details
Details for the file fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl
.
File metadata
- Download URL: fast5mod-1.0.5-cp37-cp37m-manylinux1_x86_64.whl
- Upload date:
- Size: 26.9 kB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae6da61d25f25d9324238e0a18d287f6aad00e8d9a92dfccfa3c1f39687b6b0f |
|
MD5 | e34fb3bd17602c25c96832b8f1b9d638 |
|
BLAKE2b-256 | d00d3c14e1b7d5e5b0668c46d5b567be07a00b9045400e975e9a10392cace371 |
File details
Details for the file fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl
.
File metadata
- Download URL: fast5mod-1.0.5-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 26.9 kB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 575ce86a03336c7630f76e7829f704b0c998d2c43be803cd53644091225dd176 |
|
MD5 | e01a965f51d3fa3073a5aa88f64e1bb1 |
|
BLAKE2b-256 | b6e210e19c4daa0609f527d74f49bdfb63ac8e05e5bc4ec1e87713cf8385eef4 |
File details
Details for the file fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl
.
File metadata
- Download URL: fast5mod-1.0.5-cp35-cp35m-manylinux1_x86_64.whl
- Upload date:
- Size: 26.9 kB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 648c1a7034688b7f823e18f8c3ffd3f8d12a94ff05d05fd301b2869582dc018d |
|
MD5 | 675eff51f90534aeab18455bae4c2fa4 |
|
BLAKE2b-256 | 04737b0d28686b2358deeac5299391c01e2aea67b4aa3098da6100422032d7de |