Skip to main content

Analysis of allele-specific methylation in bisulfite DNA sequencing.

Project description

pyllelic

Language grade: Python CodeFactor Codacy Badge Codacy Badge GitHub Workflow Status Code style: black

PyPI Anaconda-Server Badge GitHub

pyllelic: a tool for detection of allelic-specific methylation variation in bisulfite DNA sequencing files.

Pyllelic documention is available at https://paradoxdruid.github.io/pyllelic/ and see pyllelic_notebook.ipynb for a fully explored demonstration.

Quickstart

Run an interactive sample pyllelic environment in your web browser using mybinder.org:

Binder

pyllelic in action

pyllelic demo gif

Dependencies and Installation

Using Conda (preferred)

Create a new conda environment using python 3.8:

Easiest:

# Get environment.yml file from this repo
curl -L https://github.com/Paradoxdruid/pyllelic/blob/master/environment.yml?raw=true > env.yml

# Create and activate conda environment
conda env create --file=env.yml
conda activate pyllelic
or more explictly step by step instructions
conda create --name pyllelic python=3.8
conda activate pyllelic
conda config --env --add channels conda-forge
conda config --env --add channels bioconda
conda config --env --add channels paradoxdruid
conda install pyllelic 

# Optional but usual use case:
conda install notebook jupyter_contrib_nbextensions ipywidgets

Docker container

docker pull ghcr.io/paradoxdruid/pyllelic:latest

PyPi installation

PyPi instructions

This will require independent installation of samtools, bowtie2, and bismark packages.

# PyPi
python3 -m pip install pyllelic
# or Github
python3 -m pip install git+https://github.com/Paradoxdruid/pyllelic.git

Example exploratory use in jupyter notebook

Set up files:

  from pyllelic import process
  from pathlib import Path

  # Retrieve promoter genomic sequence of region to analyze
  process.retrieve_promoter_seq("tert_genome.txt", chrom: "chr5", start: 1293000, end: 1296000)

  # Download a reference genome and bisulfite sequencing data
  # Genome data from, e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg19
  # Fastq data from, e.g. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethylRrbs/
  genome = Path("/{your_directory}/{genome_file_directory}")
  fastq = Path("/{your_directory}/{your_fastq_file.fastq.gz}")

  # Use bismark tool to prepare bisulfite genome and align fastq to bam file
  process.prepare_genome(genome) # can optionally give path to bowtie2 if not in PATH
  process.bismark(genome, fastq)

  # Sort and index the resultant bam file
  bamfile = Path("/{your_directory}/{bam_filename}.bam")
  process.pysam_sort(bamfile)
  process.pysam_index(bamfile / parent / bamfile.stem / "_sorted.bam")

Run pyllelic:

    from pyllelic import pyllelic

    config = pyllelic.configure(  # Specify file and directory locations
        base_path="/home/jovyan/assets/",
        prom_file="tert_genome.txt",
        prom_start=1293200,
        prom_end=1296000,
        chrom="5",
        offset=1293000,  # start position of retrieved promoter sequence
        # viz_backend="plotly",
        # fname_pattern=r"^[a-zA-Z]+_([a-zA-Z0-9]+)_.+bam$",
        # test_dir="test",
        # results_dir="results",
    )

    files_set = pyllelic.make_list_of_bam_files(config)  # finds bam files

    # Run pyllelic; make take some time depending on number of bam files
    data = pyllelic.pyllelic(config=config, files_set=files_set)

    positions = data.positions

    cell_types = data.cell_types

    means_df = data.means  # mean methylation of reads

    modes_df = data.modes  # mode methylation of reads
    
    diff_df = data.diffs  # difference mean - mode of reads

    individual_data = data.individual_data  # read methylation values

    data.save("output.xlsx")  # save methylation results

    data.save_pickle("my_run.pickle")  # save data object for later analysis
    
    data.write_means_modes_diffs(filename="Run1_")  # write output data files

    data.histogram("CELL_LINE", "POSITION")  # visualize data for a point

    data.heatmap(min_values=1)  # methylation level heatmap

    data.reads_graph()  # individual methylated / unmethylated reads graph

    data.quma_results["CELL_LINE"]  # see summary data for a cell line

Authors

This software is developed as academic software by Dr. Andrew J. Bonham at the Metropolitan State University of Denver. It is licensed under the GPL v3.0.

This software incorporates implementation from QUMA, licensed under the GPL v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyllelic-0.4.0.tar.gz (82.3 kB view details)

Uploaded Source

Built Distribution

pyllelic-0.4.0-py3-none-any.whl (81.5 kB view details)

Uploaded Python 3

File details

Details for the file pyllelic-0.4.0.tar.gz.

File metadata

  • Download URL: pyllelic-0.4.0.tar.gz
  • Upload date:
  • Size: 82.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyllelic-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b0ac27b489ebd191e8ba2a8b3f75617d88495b58ea4e404b58227f8a89bfa6d8
MD5 2dacd31c26dfd9bf19c3f5fa6114fa14
BLAKE2b-256 4cb1129a93f9ab96e3d04aa3fe60f88af489e5cfbaa8dd3b10a1b517fb9eac10

See more details on using hashes here.

File details

Details for the file pyllelic-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pyllelic-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyllelic-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1067a3c91c87f0939719244c6850497f02518a60d09187a07b8772a3b845f89
MD5 dda1c0f743909404e35c5015d22fd239
BLAKE2b-256 53db14d83b060b99325177b844242b9de3effaf4248c35e9e2aaeb65280768c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page