Skip to main content

Extract Methylation calls from ONT or PB long read data

Project description

LoReMe pipeline

LoReMe (Long Read Methylaton) is a Python package facilitating analysis of DNA methylation signals from Pacific Biosciences or Oxford Nanopore long read sequencing data.

It consists of an API and CLI for three distinct applications:

  1. Pacific Biosciences data processing. PB reads in SAM/BAM format are aligned to a reference genome with the special-purpose aligner pbmm2, a modified version of minimap2. Methylation calls are then piled up from the aligned reads with pb-CpG-tools.

  2. Oxford nanopore basecalling. ONT reads are optionally converted from FAST5 to POD5 format, then basecalled and aligned to a reference with dorado (dorado alignment also uses minimap2 under the hood), and finally piled up with modkit.

  3. Postprocessing and QC of methylation calls. Several functions are available to generate diagnostic statistics and plots.

See also the full documentation.

Other tools of interest: methylartist and modbamtools (modbamtools docs), methplotlib

Installation

In a Conda environment

The recommended way to install loreme is with a dedicated conda environment:

First create an environment including all dependencies:

conda create -n loreme -c conda-forge -c bioconda samtools pbmm2 \
  urllib3 pybedtools gff2bed seaborn pyfaidx psutil gputil tabulate \
  cython h5py iso8601 more-itertools tqdm
conda activate loreme

Then install with pip:

pip install loreme

You may also wish to install nvtop to monitor GPU usage:

conda install -c conda-forge nvtop

With pip

pip install loreme

Check installation

Check that the correct version was installed with loreme --version

Uninstall

To uninstall loreme:

loreme clean
pip uninstall loreme

Oxford Nanopore reads

Download dorado

Calling methylation from ONT long reads requires the basecaller dorado . Download it by running

loreme download-dorado <platform>

This will download dorado and several basecalling models. The platform should be one of: linux-x64, linux-arm64, osx-arm64, win64, whichever matches your system. Running loreme download-dorado --help will show a hint as to the correct choice.

Note

For members of Michael Lab at Salk running on seabiscuit, use loreme download-dorado linux-x64.

Modified basecalling

You can carry out modified basecalling (i.e. DNA methylation) with default parameters by running:

loreme dorado-basecall <pod5s/> <output.sam>

The input argument pod5s/ should be a directory containing one or more POD5 files. For other parameter options, see loreme dorado-basecall --help

Note

Basecalling ONT data is disk-read intensive, so for best performance the input POD5 data should be on a fast SSD (For example, /scratch/<username> for members of Michael Lab at Salk).

To run dorado with only regular basecalling, use the --no-mod option:

loreme dorado-basecall --no-mod <pod5s/> <output.sam>

If you wish to convert the SAM file to a FASTQ file, use:

samtools view -bo output.bam output.sam
samtools fastq -T '*' output.bam > output.fq

Alignment

The SAM file produced by dorado can be aligned to a reference index (FASTA or MMI file) with loreme dorado-align:

loreme dorado-align <index> <reads> <output.bam>

Download modkit

Piling up methylation calls from BAM data requires modkit . Download it by running:

loreme download-modkit

Pileup

The pileup step generates a bedMethyl file from an aligned BAM file.

loreme modkit-pileup <reference.fasta> <input.bam> <output.bed>

Note

See loreme modkit-pileup --help for additional options. On a HPC system you may want to use additional threads with the -t flag.

Postprocessing

See the Pacific Biosciences reads section for examples of postprocessing analysis that can be applied to bedMethyl files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loreme-0.1.5.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

loreme-0.1.5-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file loreme-0.1.5.tar.gz.

File metadata

  • Download URL: loreme-0.1.5.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a2940aac1e513ee561d16d719f6ffc352bf705928591341d3d1f3c4e2b74eefa
MD5 88f1131697fd6a4ccdacd84c4096e1be
BLAKE2b-256 889028f34e3c0a01d183b0da21f4007755213877ab1ba113f5c31c28afd97aba

See more details on using hashes here.

File details

Details for the file loreme-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: loreme-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b218a5881b650cd8e019403b5b5c7bb5dfa283de949aee05c1a16e5b17508fbf
MD5 6dd24e174047b2af96d9024c9e8b0b0a
BLAKE2b-256 812da21d5158fa79685508dbe418590ef193eaa759610d2b7db699a297c58019

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page