DRaCOoN (Differential Regulation and CO-expression Networks) is a data-driven tool optimized for effectively retrieving differential relationships between genes across two distinct conditions.
Project description
DRaCOoN: Differential Regulation and CO-expression Networks
Introduction
DRaCOoN offers a powerful, data-driven approach to uncover differential gene relationships across conditions, efficiently handling large datasets through various analysis modes. It generates networks by identifying gene pairs with changing associations.
Features
- Computes differential association and differential regulatory networks.
- Optimized for large datasets.
- Supports multiple working modes based on available data and analysis goals.
- Internal parallelization for faster computation.
- Utilizes Numba for just-in-time (JIT) compilation, accelerating the analysis process.
Requirements
- Python 3.x
- Additional Python libraries as specified in
requirements.txt
Installation
Using pip
pip install -i https://test.pypi.org/simple/dracoon
From source
git clone https://github.com/fmdelgado/DRaCOoNpy.git
cd DRaCOoN/app
pip install -r requirements.txt
Imports
# if installed from pip
import dracoon
# if installed from source
from app.dracoon import dracoon
Algorithmic Overview
The algorithm operates in several major steps:
-
Data Input: Accepts an expression dataset (microarray or RNA-Seq) with multiple samples across two conditions. A minimum of 20 samples per condition is recommended for meaningful results.
-
Background Model Estimation: Computes a permutation test-based background model for significance estimation.
-
Differential Metrics Calculation: Calculates two differential metrics, absolute difference (Δr) and shift difference (s), for network edges.
-
Significance Testing: Assigns p-values based on the background model and adjusts for multiple testing.
Algorithmic Details
DRaCOoN assesses the change in condition-specific correlations between pairs of genes. It utilizes different association metrics like, Pearson's r and Spearman's ρ correlation coefficients or an entropy-based metric. Then it computes differential metrics based on these values.
Differential Metrics
- absdiff: Absolute difference in the association between two genes across two conditions, estimated as:
- shift: The relative change in association between two genes across two conditions with respect to their condition-agnostic association, estimated as:
P-value Estimation (pval_method
)
-
permutation: Only available when
matrixform = False
. For each evaluated gene pair, shuffles their values to create a distribution of ( n ) random values for both absdiff and shift. -
background: Uses a background distribution estimated from ( n ) random pairs of genes that have been shuffled randomly.
-
fitted_background: Fits a set of known distribution to the previous background model and uses the best-fitting distribution to estimate p-values analytically.
Then, p-values for both absdiff and shift are adjusted using one of the multiple methods available at statmodels. By default,. p-value adjustment method is Benjamini/Hochberg (fdr_bh).
The final output of DRaCOoN
includes those relationships whose absdiff or shift is lower than a significance threshold, 0.05 by default.
Working Modes
-
Mode 1: Differential Co-expression (DC) for all possible gene-gene associations. Produces an undirected network.
-
Mode 2: User-defined associations for differential examination (Pathway-level DC or Differential Regulation). Produces a directed network.
For more detailed information on the algorithm, please refer to the academic paper (citation needed).
Parameters
- cond_data: The condition data frame.
- biom_data: The gene expression data frame.
- DRaCOoN_program: Either
DR
for differential regulation,DC
for differential correlation. - associations_df: Only for
DR
mode. InDR
associations_df is a dataframe containing the set of source-target interactions to evaluate, containing columns source and target. - significance: The significance level to use as a threshold for adjusted p-values, 0.05 by default.
- association_measure: The association measure to use, either
entropy
,pearson
orspearman
. By defaultentropy
. - pval_method: Either
permutation
,background
orfitted_background
. By defaultfitted_background
. - distributions_to_fit: If using
fitted_background
, the distributions to fit. By default:- best_dists_absdiff = ['expon', 'logistic', 'rayleigh', 'norm', 'gumbel_r', 'pareto', 'laplace', 'kstwobign', 'moyal', 'halfnorm']
- best_dists_shift = ['logistic', 'norm', 'laplace', 'gumbel_l', 'gumbel_r', 'uniform', 'expon', 'rayleigh', 'hypsecant']
- See other available distributions at the fitter documentation.
- timeout_fitter: If using
fitted_background
, the maximum time to fit a distribution, by default 60 seconds. - pvalue_adjustment_method:
fdr_bh
by default. - iters: If running permutation tests, the number of iterations, by default 10 000.
- association_pvalue_filter: (Optional) filters conditional relationships based on the adjusted p-value of the condition-specific associations.
- matrixform: Recommended for small datasets (<1000 genes, <1000 samples). Default
True
. - verbose: If
True
, shows the algorithmic progress. By defaultFalse
.
Methods
The main method of DRaCOoN is run()
. This method sequentially runs the following methods:
preprocessing()
estimate_background_model()
calculate_correlations()
threshold_results()
For more detailed information on the algorithm, please refer to the academic paper (citation needed).
Contributing
If you find any bugs or wish to propose new features, please let us know.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.