Skip to main content

Zone Equalisation Normalisation: A Python package for bigWig scaling

Project description

Zone Equalisation Normalisation

ZEN-norm is a Python package for normalising bigWigs of genomic signal, such as ATAC-seq, ChIP-seq and TT-seq by Zone Equilisation Normalisation (ZEN). This also includes modules for reversing prior bigWig normalisation and creating plots to compare performance of normalisation methods genome-wide.

Citation: T. Wilson, TA. Milne, SG. Riva and JR. Hughes, Zone Equalisation Normalisation For Improved Alignment of Epigenetic Signal, Unpublished, 2025



Contents
  1. Installation
  2. Tutorials
  3. Reversing Prior bigWig Normalisation
  4. Normalising bigWigs With ZEN
  5. Evaluating Normalisation Method Performance


1. Installation

ZEN-norm is designed to run on Python 3.10 and above. It is installable from either PyPI or Conda.

PyPI Installation To install the ZEN-norm package from PyPI, run the command below:
python -m pip install  --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ZEN-norm-test

Conda Installation To install the ZEN-norm package from Conda, active the conda environment you'd like to install the package into (conda activate ...) and run the command below:
conda install zen-norm

Alternatively, if there are issues installing ZEN-norm, a conda environment with the required packages can be created using the zen_environment.yml file.

conda env create --name zen_env --file=environment/zen_environment.yml
conda activate zen_env


2. Tutorials

Main Tutorial A detailed Jupyter notebook tutorial explaining how to use ZEN for reversing prior normalisation, normalising bigWigs with ZEN and evaluating normalisation methods via Wasserstein distance plots and MA plots is provided within the tutorial/zen_tutorials folder of this repository. For a quick overview of the avaliable features, see sections 3 to 6 below.

Publication Supplementary Figures A Jupyter notebook is provided in folder tutorial/supplementary_figures to document how the figures were created in the Supplementary section of the ZEN publication.


3. Reversing Prior bigWig Normalisation

Module ReverseNorm provides an optional step to enable non-normalised bigWigs to be created from pre-normalised bigWigs. This is not required if BAMs are available, but is designed to avoid double normalisation if renormalising bigWigs with ZEN. It works by estimating the coverage value that has been produced by a single read fragment and dividing signal by this to obtain the coverage that would have been produced had linear normalisation (e.g. RPKM, CPM) not been applied.


Basic Example
from ZEN_norm.reverse_norm import ReverseNorm

rev = ReverseNorm(analysis_name = "Example_Analysis", # Set custom output folder name
                  bigwig_paths = ["path/to/normalised_bigwigs/sample_A.bw", "path/to/normalised_bigwigs/sample_B.bw"], # Specify a list of bigWig paths
                  n_cores = 8) # Set number of cores to use
rev.reverseNorm(chromosomes = ["chr19"])


4. Normalising bigWigs With ZEN

Module ZoneNorm normalises genomic coverage with ZEN. Steps include: BAM to bigWig mapping, convolution to create smoothed signals, distribution fitting, signal zone prediction (coordinates of consistent regions of signal) and generating bigWigs normalised with ZEN. It runs on genomic signals from either BAMs or bigWigs. If using bigWigs that have been pre-normalised, then it is advisable to first remap them without normalisation, or to use ZEN-norm's ReverseNorm module.


Basic Example
# EITHER create bigWigs without normalisation from BAMs
znorm = ZoneNorm(analysis_name = "Example_Analysis", # Name of output folder
                 bam_paths = ["path/to/bams/sample_A.bam", "path/to/bams/sample_B.bam"], # List or directory of BAM files
                 n_cores = 8, # Number of processors
                 extend_reads = True, # Whether to extend reads during BAM to bigWig mapping (False recommended for transcriptional assays)
                 filter_strand = False) # Whether to separate by strand (True recommended for transcriptional assays)

# OR set bigWigs directly
znorm = ZoneNorm(analysis_name = "Example_Analysis", # Name of output folder
                 bigwig_paths = ["path/to/raw_bigwigs/sample_A.bw", "path/to/raw_bigwigs/sample_B.bw"], # List or directory of bigWig files
                 n_cores = 8) # Number of processors

# Create smoothed signal
znorm.convolveSignals()
# Test Laplace distribution
znorm.testDistributions()
# Use distribution to predict signal zone coordinates
znorm.predictSignalZones()
# Create normalised bigWigs
znorm.normaliseSignal()


5. Evaluating Normalisation Method Performance

To quantify genome-wide performance across normalisation methods, Wasserstein distribution plots or MA plots can be created.

Wasserstein Distance Plots Within a Wasserstein distance plot, min-max scaled pairwise sample Wasserstein distance (w) is measured over regions (e.g. peaks or zones) and plotted as violin and / or box plots per normalisation method. The normalisation method with the lowest average w therefore has the best alignment across the genome for regions of interest. For example in the plot below of erythroid ATAC-seq, ZEN has the lowest mean w and it is significantly lower than all other normalisation methods according to a t-test comparing the distributions.

MA Plots MA plots compare differences in total signal (M) relative to signal intensity (A) between pairs of samples across regions (e.g. peaks or zones). This is useful to assess how effective a normalisation method is at reducing bias. For example in the HeLa TT-seq plot below, a point in each subplot represents the mean count of the signal over a region's coordinates for two samples after RPKM normalisation. The dotted red line is a reference, whereby the closer the points fall, the closer the average counts are for the samples.


6. Visualising Signal and Zones

When normalising genomic signal with ZEN, using bigWigs as inputs and outputs allows some steps in the process to be visualised providing greater transparency than count based normalisation methods. For example, bigWigs can be saved after reverse normalisation, smoothing via convolution (useful to view thresholds against this signal) and after normalisation with ZEN. In additon, predicted signal zones (coordinates of consistent regions of signal) can be saved to BED files and visualised in the same way as peak calls. These signals can therefore be viewed using either track plots included in the package, or genome browsers.

Track Plots When running ZEN-norm, regions of signal for one or more samples can be viewed as track plots. These are customisable as demonstrated in the examples below:

Viewing Non-Normalised Signals

Signal from one or more samples can be overlaid to view alignment prior to normalisation.

Comparing Zone Thresholds Against Convoluted Signal

After smoothing signal via convolution and distribution fitting, thresholds are derived from the distributions. These can be viewed against the convoluted signal to see how a threshold will separate signal from background noise during signal zone prediction.

Viewing Signal Zones

After signal zone prediction, zones can be visualised as bars within the track plots.

Viewing Normalised Signals

Signal from one or more samples can be overlaid to view alignment after normalisation.

Genome Browsers After saving signals to bigWigs and zones as BED files, they can be uploaded to an interactive genome browser such as UCSC Genome Browser or Multi Locus View (MLV).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zen_norm-0.0.1.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zen_norm-0.0.1-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file zen_norm-0.0.1.tar.gz.

File metadata

  • Download URL: zen_norm-0.0.1.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for zen_norm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f499d29c19e7fff3ec630e795883cef11b1db9a71f1ac3a25b477e55d8f573a6
MD5 22e0fdc94bbb4c94f5197c363108657a
BLAKE2b-256 b1fea7fb197800918f5209b2a8fbd5a0f7e6cf72092102c6c2964a8fb3ff954e

See more details on using hashes here.

File details

Details for the file zen_norm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: zen_norm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for zen_norm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d004a6ae4a9a2bf0aa20301fe8aa86089bd7de4336595f3ceae041f7b4f87944
MD5 6dfe4eb9927ab1b6e540b1e3e7d71c1d
BLAKE2b-256 27bf33324e8b115de3870b1720ee4a8a5a377b664399278c82df65c940a25828

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page