Skip to main content

Extract Methylation calls from ONT or PB long read data

Project description

LoReMe pipeline

LoReMe (Long Read Methylaton) is a Python package facilitating analysis of DNA methylation signals from Pacific Biosciences or Oxford Nanopore long read sequencing data.

It consists of an API and CLI for three distinct applications:

  1. Pacific Biosciences data processing. PB reads in SAM/BAM format are aligned to a reference genome with the special-purpose aligner pbmm2, a modified version of minimap2. Methylation calls are then piled up from the aligned reads with pb-CpG-tools.

  2. Oxford nanopore basecalling. ONT reads are optionally converted from FAST5 to POD5 format, then basecalled and aligned to a reference with dorado (dorado alignment also uses minimap2 under the hood), and finally piled up with modkit.

  3. Postprocessing and QC of methylation calls. Several functions are available to generate diagnostic statistics and plots.

See also the full documentation.

Other tools of interest: methylartist and modbamtools (modbamtools docs), methplotlib

Installation

In a Conda environment

The recommended way to install loreme is with a dedicated conda environment:

First create an environment including all dependencies:

conda create -n loreme -c conda-forge -c bioconda samtools pbmm2 \
  urllib3 pybedtools gff2bed seaborn pyfaidx psutil gputil tabulate \
  cython h5py iso8601 more-itertools tqdm
conda activate loreme

Then install with pip:

pip install loreme

You may also wish to install nvtop to monitor GPU usage:

conda install -c conda-forge nvtop

With pip

pip install loreme

Check installation

Check that the correct version was installed with loreme --version

Uninstall

To uninstall loreme:

loreme clean
pip uninstall loreme

Oxford Nanopore reads

Download dorado

Calling methylation from ONT long reads requires the basecaller dorado . Download it by running

loreme download-dorado <platform>

This will download dorado and several basecalling models. The platform should be one of: linux-x64, linux-arm64, osx-arm64, win64, whichever matches your system. Running loreme download-dorado --help will show a hint as to the correct choice.

Note

For members of Michael Lab at Salk running on seabiscuit, use loreme download-dorado linux-x64.

Modified basecalling

You can carry out modified basecalling (i.e. DNA methylation) with default parameters by running:

loreme dorado-basecall <pod5s/> <output.sam>

The input argument pod5s/ should be a directory containing one or more POD5 files. For other parameter options, see loreme dorado-basecall --help

Note

Basecalling ONT data is disk-read intensive, so for best performance the input POD5 data should be on a fast SSD (For example, /scratch/<username> for members of Michael Lab at Salk).

To run dorado with only regular basecalling, use the --no-mod option:

loreme dorado-basecall --no-mod <pod5s/> <output.sam>

If you wish to convert the SAM file to a FASTQ file, use:

samtools view -bo output.bam output.sam
samtools fastq -T '*' output.bam > output.fq

Alignment

The SAM file produced by dorado can be aligned to a reference index (FASTA or MMI file) with loreme dorado-align:

loreme dorado-align <index> <reads> <output.bam>

Download modkit

Piling up methylation calls from BAM data requires modkit . Download it by running:

loreme download-modkit

Pileup

The pileup step generates a bedMethyl file from an aligned BAM file.

loreme modkit-pileup <reference.fasta> <input.bam> <output.bed>

Note

See loreme modkit-pileup --help for additional options. On a HPC system you may want to use additional threads with the -t flag.

Postprocessing

See the Pacific Biosciences reads section for examples of postprocessing analysis that can be applied to bedMethyl files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loreme-0.1.7.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

loreme-0.1.7-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file loreme-0.1.7.tar.gz.

File metadata

  • Download URL: loreme-0.1.7.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.7.tar.gz
Algorithm Hash digest
SHA256 33e4186e64019a6f6c2fc711fff2128cebada45fd6a05a8a860437828e769c0f
MD5 55ed06bde328dee976a5cd60e74980ef
BLAKE2b-256 9f73651c79ec222a6e93b6de967abf7ac11fb5c0873f2cfff26d9b145efc62a9

See more details on using hashes here.

File details

Details for the file loreme-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: loreme-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for loreme-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 282d4459ffa22f0c510f8a097e5f4a78df72e46f63960d95fcc97a39749ffb9a
MD5 05cb9218073cd1cb30e6990b395c75d2
BLAKE2b-256 3253b382c343472ff8bbd1d24bfad2bf9d72c6e223c98e0d0ac6537882e555b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page