Skip to main content

Pipeline for phased methylation calling on raw nanopore data

Project description

phased-methylation

This package defines a pipeline for phased methylation calling on raw Nanopore data. It has three steps:

  1. Mapping reads to a reference genome (minimap2)
  2. Phasing variants and reads (longshot)
  3. Calling methylated bases (megalodon)

TODO

Environment setup and installation

conda create -n phased-methylation -c bioconda -c conda-forge git cython pytest \
  pandas pysam pybedtools minimap2 longshot megalodon==2.3.4 gputil psutil \
  tabulate pyfaidx gff2bed
conda activate phased-methylation
pip install ont_pyguppy_client_lib nvsmi tempfifo phased-methylation

Test run

To execute a test run of the pipeline, use the test subcommand

phased-methylation test <test_dir/>

Usage

usage: phased-methylation [-h]
                          {launch,map,phase,call,test,mean,promoter,gene-body,plot,plot-genes,plot-repeats,export-metilene,export-bedgraph}
                          ...

pipeline for methylation calling

positional arguments:
  {launch,map,phase,call,test,mean,promoter,gene-body,plot,plot-genes,plot-repeats,export-metilene,export-bedgraph}
    launch              launch full pipeline
    map                 perform mapping step
    phase               perform phasing step
    call                perform methylation calling step
    test                execute test run
    mean                calculate average methylation across chromosomes
    promoter            quantify promoter methylation
    gene-body           quantify gene body methylation
    plot                plot methylation across chromosomes
    plot-genes          plot methylation profiles over genomic features
    plot-repeats        plot methylation profiles over genomic features
    export-metilene     export methylation data formatted for input into
                        metilene
    export-bedgraph     export methylation data in bedgraph format

optional arguments:
  -h, --help            show this help message and exit

Input files

Output files

Results will be written to the indicated output directory.

Example

Minimal example:

phased-methylation launch reference.fa fast5s_dir/ output_dir/ query.fastq

GPU resource management

phased-methylation (specifically, the call step using megalodon) requires at least one available GPU to run successfully. When the pipeline is launched, it will check for availability of the devices indicated by the --devices argument, defaulting to device 0. If one or more indicated devices are not available, the user will be prompted to free up resources by terminating processes running on them:

Cannot launch because the following processes are occupying resources on device(s) 0:

=====  =====  ========================================  ==================
  GPU    PID  Process Name                              GPU Memory Usage
=====  =====  ========================================  ==================
    0  16397  /opt/ont/guppy/bin/guppy_basecall_server  9.80 GB
=====  =====  ========================================  ==================

Terminate these processes and continue? [y/N]:

Enter y to terminate the indicated processes and continue with the pipeline, or enter any other input to terminate phased-methylation.

Misc

Documentation from original shell script:

IMPORTANT, this can only be run when there are no sequencing runs in progress. If there are they must be paused and then the guppy_basecaller needs to be killed to clear the GPU memory. In a terminal, type nvidia-smi and find the job ID of the guppy_basecaller(there may be two running). KILL THEM all Run nvidia-smi again and you should see that the GPU memory usage is very low now. Start this script. Once it gets to the megalodon part you can start sequencing on the PromethION again. If you start the sequener too soon this script will crash once it gets to the megalodon step

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phased-methylation-0.17.7.tar.gz (19.0 MB view details)

Uploaded Source

Built Distribution

phased_methylation-0.17.7-py3-none-any.whl (19.1 MB view details)

Uploaded Python 3

File details

Details for the file phased-methylation-0.17.7.tar.gz.

File metadata

  • Download URL: phased-methylation-0.17.7.tar.gz
  • Upload date:
  • Size: 19.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for phased-methylation-0.17.7.tar.gz
Algorithm Hash digest
SHA256 81eb60693815201fa2237f0ee04cb6cc867e0260080d9c21bde7e112f343324c
MD5 3322aa3ea8221616b25e93b27251d516
BLAKE2b-256 81743373efbd3746bd729c217bc52237c02024d1bff2b675393c0ae0b581e758

See more details on using hashes here.

File details

Details for the file phased_methylation-0.17.7-py3-none-any.whl.

File metadata

File hashes

Hashes for phased_methylation-0.17.7-py3-none-any.whl
Algorithm Hash digest
SHA256 67942f26b0c01531d1b3c938aec79f75078cb6fc9acc8d023eed0414889a399e
MD5 be9cb142acad341680313fafc523f4f6
BLAKE2b-256 2221c7673d9769786a71f84bba16ff73fa114a53b4d4ec55644e571ba9f0a154

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page