Skip to main content

Extract Methylation calls from ONT or PB long read data

Project description

LoReMe pipeline

LoReMe (Long Read Methylaton) is a Python package facilitating analysis of DNA methylation signals from Pacific Biosciences or Oxford Nanopore long read sequencing data.

It consists of an API and CLI for three distinct applications:

  1. Pacific Biosciences data processing. PB reads in SAM/BAM format are aligned to a reference genome with the special-purpose aligner pbmm2, a modified version of minimap2. Methylation calls are then piled up from the aligned reads with pb-CpG-tools.

  2. Oxford nanopore basecalling. ONT reads are optionally converted from FAST5 to POD5 format, then basecalled and aligned to a reference with dorado (dorado alignment also uses minimap2 under the hood), and finally piled up with modkit.

  3. Postprocessing and QC of methylation calls. Several functions are available to generate diagnostic statistics and plots.

See also the full documentation.

Other tools of interest: methylartist and modbamtools (modbamtools docs), methplotlib

Installation

In a Conda environment

The recommended way to install loreme is with a dedicated conda environment:

First create an environment including all dependencies:

conda create -n loreme -c conda-forge -c bioconda samtools pbmm2 \
  urllib3 pybedtools gff2bed seaborn pyfaidx psutil gputil tabulate \
  cython h5py iso8601 more-itertools tqdm
conda activate loreme

Then install with pip:

pip install loreme

You may also wish to install nvtop to monitor GPU usage:

conda install -c conda-forge nvtop

With pip

pip install loreme

Check installation

Check that the correct version was installed with loreme --version

Uninstall

To uninstall loreme:

loreme clean
pip uninstall loreme

Oxford Nanopore reads

Download dorado

Calling methylation from ONT long reads requires the basecaller dorado . Download it by running

loreme download-dorado <platform>

This will download dorado and several basecalling models. The platform should be one of: linux-x64, linux-arm64, osx-arm64, win64, whichever matches your system. Running loreme download-dorado --help will show a hint as to the correct choice.

Note

For members of Michael Lab at Salk running on seabiscuit, use loreme download-dorado linux-x64.

Modified basecalling

You can carry out modified basecalling (i.e. DNA methylation) with default parameters by running:

loreme dorado-basecall <pod5s/> <output.sam>

The input argument pod5s/ should be a directory containing one or more POD5 files. For other parameter options, see loreme dorado-basecall --help

Note

Basecalling ONT data is disk-read intensive, so for best performance the input POD5 data should be on a fast SSD (For example, /scratch/<username> for members of Michael Lab at Salk).

To run dorado with only regular basecalling, use the --no-mod option:

loreme dorado-basecall --no-mod <pod5s/> <output.sam>

If you wish to convert the SAM file to a FASTQ file, use:

samtools view -bo output.bam output.sam
samtools fastq -T '*' output.bam > output.fq

Alignment

The SAM file produced by dorado can be aligned to a reference index (FASTA or MMI file) with loreme dorado-align:

loreme dorado-align <index> <reads> <output.bam>

Download modkit

Piling up methylation calls from BAM data requires modkit . Download it by running:

loreme download-modkit

Pileup

The pileup step generates a bedMethyl file from an aligned BAM file.

loreme modkit-pileup <reference.fasta> <input.bam> <output.bed>

Note

See loreme modkit-pileup --help for additional options. On a HPC system you may want to use additional threads with the -t flag.

Postprocessing

See the Pacific Biosciences reads section for examples of postprocessing analysis that can be applied to bedMethyl files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loreme-0.1.8.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

loreme-0.1.8-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file loreme-0.1.8.tar.gz.

File metadata

  • Download URL: loreme-0.1.8.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.8.tar.gz
Algorithm Hash digest
SHA256 996e03deb6916c08fbe3ab9da1a65c1a3bee86d24bfe9047a9160a818dedabb8
MD5 4991614d988324f6766130be6c513008
BLAKE2b-256 688c3244a23df431d2324b287d4c535fdffa880a3eafa4083a265c9f0ef021f3

See more details on using hashes here.

File details

Details for the file loreme-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: loreme-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 177087aa978319a3f1bbbdc275db70a72fdba99700559c3ed241b84fc7213097
MD5 774e3f55835d3723ede824edcb9e2108
BLAKE2b-256 c5f2896e5f6f9549fbacb2cfb81a110290cc84fe4b196fb3b118217c2e30e1e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page