Skip to main content

Find hypomethylated regions in centromeres

Project description

centrodip

Installation

# conda Install:   
conda install jmmenend::centrodip

# docker Run:     
docker run -it jmmenend/centrodip:latest

# pip install:                   
pip install centrodip

Preprocessing:

centrodip requires two inputs: (1) a bedMethyl file from modkit and (2) active-alpha annotations

(1) can be created by aligning a BAM and calling modkit pileup

Example:

UBAM="HG002.unaligned.bam"
FA="HG002.fa"

# convert to FQ, then align
samtools fastq -T '*' $UBAM > $FQ
minimap2 ... $FQ > $SAM

# convert to BAM and index
samtools view -bh $SAM > $BAM
samtools index $BAM

# aggregate methylation with modkit
modkit pileup --cpg --ref $FA $BAM $bedMethyl

(2) can be created by subsetting the output from the cenSat Annotation workflow

Documentation for running this workflow can be found here
Example:

CENSAT="HG002.censat.bed"

# filter for only active-alpha censat annotations
grep "active_hor" $CENSAT > $ACTIVE_ALPHA

# it is recommended to perform a bedtools merge on these subset annotations
bedtools merge -d 100000 $ACTIVE_ALPHA > $regions

Running centrodip:

centrodip $bedMethyl $regions $output

Inputs:

  1. bedMethyl - modkit pileup file (Refer to modkit github).
  2. regions - bed file of regions you want to search for CDRs.
  3. output - name of output file.

Output:

Default output file is a BED file with 9 columns

  • Column 4 can be adjusted with the --label flag
  • Column 9 can be adjusted with the --color flag
  • The --debug flag adds chromosomal summary printouts and additional outputs like smoothed methylation, and unfiltered dip calls
  • The --plot flag creates a folder that contains summary png files for each chromosome

Help Documentation

usage: centrodip [-h] [--mod-code MOD_CODE] [--bedgraph] [--window-size WINDOW_SIZE] [--cov-conf COV_CONF] [--prominence PROMINENCE] [--height HEIGHT] [--broadness BROADNESS] [--enrichment] [--min-size MIN_SIZE]
                 [--min-score MIN_SCORE] [--cluster-distance CLUSTER_DISTANCE] [--label LABEL] [--color COLOR] [--plot] [--threads THREADS] [--debug]
                 bedMethyl regions output

Inspect BED / bedGraph files using BedTable

positional arguments:
  bedMethyl             Path to the bedMethyl file
  regions               Path to BED file of regions to search for dips
  output                Path to the output BED file

options:
  -h, --help            show this help message and exit

Input Options:
  --mod-code MOD_CODE   Modification code to filter bedMethyl file. Selects rows with this value in the fourth column. (default: "m")
  --bedgraph            Input file in a bedGraph format rather than bedMethyl. Requires bedGraph4 with the fourth column being fraction modified (default: False)

Smoothing Options:
  --window-size WINDOW_SIZE
                        Window size (bp) to use in LOWESS smoothing of fraction modified. (default: 10000)
  --cov-conf COV_CONF   Minimum coverage required to be a confident CpG site. (default: 10)

Detection Options:
  --prominence PROMINENCE
                        Sensitivity of dip detection for scipy.signal.find_peaks. Higher values require more pronounced dips. Must be a float between 0 and 1. (default: 0.5)
  --height HEIGHT       Minimum depth for dip detection, lower values require deeper dips. Must be a float between 0 and 1. (default: 0.1)
  --broadness BROADNESS
                        Broadness of dips called, higher values make broader entries. Must be a float between 0 and 1. (default: 0.75)
  --enrichment          Find regions that are enriched (rather than depleted) for methylation. (default: False)

Filtering Options:
  --min-size MIN_SIZE   Minimum dip size in base pairs. (default: 1000)
  --min-score MIN_SCORE
                        Minimum score that a dip must have to be kept. Must be an int between 0 and 1000. (default: 500)
  --cluster-distance CLUSTER_DISTANCE
                        Cluster distance in base pairs. Attempts to keep the single largest cluster of annotationed dips. Negative Values turn it off. (default: 500000)

Output Options:
  --label LABEL         Label to use for regions in BED output. (default: "CDR")
  --color COLOR         Color of predicted dips. (default: "50,50,255")

Other Options:
  --plot                Create summary plot of the results. Written to <output_prefix>.summary.png (default: False)
  --threads THREADS     Number of worker processes. (default: 4)
  --debug               Dumps smoothed methylation values, their derivatives, methylation peaks, and derivative peaks. Each to separate BED/BEDGraph files. (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

centrodip-1.0.2.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

centrodip-1.0.2-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file centrodip-1.0.2.tar.gz.

File metadata

  • Download URL: centrodip-1.0.2.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for centrodip-1.0.2.tar.gz
Algorithm Hash digest
SHA256 94733aaa51e7f323dbd9b2dd198eebc84322c6d32f936f2aae01ce403b39a08b
MD5 586e5ca7eec59cb9e854d72144081a0c
BLAKE2b-256 945e5a17559b6e352aee4cb58ff3e159c28db1ec2640c576a7fe47c0927c33e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for centrodip-1.0.2.tar.gz:

Publisher: publish-pypi.yml on jmenendez98/centrodip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file centrodip-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: centrodip-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for centrodip-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5d64ad166b88e008da6da914d36b359def607192ef487a1e8001297a74ca120e
MD5 e22fbfb59980aff790884c2ea124bd6d
BLAKE2b-256 998adf70cc1a5ff754536516a63870af070e205c56a1ce3f42b3f9dd6f6d3353

See more details on using hashes here.

Provenance

The following attestation bundles were made for centrodip-1.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on jmenendez98/centrodip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page