Skip to main content

DysRegNet

Project description

PyPI version

DysRegNet package

DysRegNet, is a method for inferring patient-specific regulatory alterations (dysregulations) from gene expression profiles. DysRegNet uses linear models to account for confounders and residual-derived z-scores to assess significance.

Installation

To install the package from PyPI please run:

pip install dysregnet

or you can install it from git:

git clone https://github.com/biomedbigdata/DysRegNet_package.git  && cd DysRegNet_package
python setup.py install

Data input

The inputs of the package are the following Pandas DataFrame objects:

  • expression_data - Gene expression matrix in the format: patients as rows (first column - patients/samples ids), and genes as columns.
  • GRN - Gene Regulatory Network (GRN) with two columns in the following order ['TF', 'target'].
  • meta - Metadata with the first column containing patients/samples ids and other columns for the condition and the covariates.

The patients id or samples ids must be the same in the "expression_data" and "meta". Additionally, gene names or ids must match the ones in the "GRN" DataFrame.

In the condition column of the meta DataFrame, the control samples should be encoded as 0 and case samples as 1.

The gene regulatory network should be provided by the user. You can either use an experimental validated GRN or learn it from control samples. We recommend using software like arboreto since you can use its output directly to DysRegNet.

Parameters

Additionally, you can provide the following parameters:

  • conCol: Column name for the condition in the meta DataFrame.

  • CatCov: List of categorical variable names. They should match the name of their columns in the meta Dataframe.

  • ConCov: List of continuous covariates. They should match the name of their columns in the meta Dataframe.

  • zscoring: If True, DysRegNet will scale the expression of each gene and all continuous confounders based on their mean and standard deviation in the control samples.

  • bonferroni_alpha: P-value threshold for multiple testing correction

  • normaltest: If True, DysRegNet runs a normality test for residuals "scipy.stats.normaltest". If residuals are not normal, the edge will not be considered in the analysis.

  • normaltest_alpha: P-value threshold for normaltest (if True).

  • R2_threshold: R-squared (R2) threshold from 0 to 1 (optional). If the fit is weaker, the edge will not be considered in the analysis.

  • direction_condition: If True, DysRegNet will only consider case samples with positive residuals (target gene overexpressed) for models with a negative TF coefficient as potentially dysregulated. Similarly, for positive TF coefficients, only case samples with negative residuals are considered. Please check the paper for more details.

The parameters are also annotated with dockstrings for more details.

Get Started

Import the package and pandas:

import dysregnet
import pandas as pd

Define the confounding variables or the design matrix

# define condition column (0 indicated control, 1 indicates case)
conCol='condition'

# define categorical confounder columns in meta dataframe 
CatCov=['race','gender']  

# define continuous confounder columns in meta dataframe.
ConCov=['birth_days_to']

Run DysRegNet

data=dysregnet.run(expression_data=expr,
                   meta=meta, 
                   GRN=grn,
                   conCol=conCol
                   CatCov=CatCov,
                   ConCov=ConCov,
                   direction_condition=True,
                   normaltest=True,
                   R2_threshold=.2)

# get the patient-specific dysregulate networks
data.get_results()

# or with binary edges
data.get_results_binary()

# get R2 values, coefficients, and coefficient p-values for all models/edges
data.get_model_stats()

The output

The package outputs a data frame that represents patient-specific dysregulated edges. The columns represent edges, and the rows are patient IDs.

In the result table, a value of 0 means that the edge is not significantly dysregulated (different from control samples). Otherwise, the z-score is reported.

The method "get_results_binary()" outputs binarized dysregulations instead of z-scores.

"get_model_stats()" outputs R2 values, coefficients, and coefficient p-values for all models/edges.

Example

A simple example for running DysRegNet: (Notebook/Google Colab).

You will need to download the demo dataset and extract the files into test dataset/

Link for the demo dataset: https://figshare.com/ndownloader/files/35142652

Cite

"DysRegNet: Patient-specific and confounder-aware dysregulated network inference" Johannes Kersting*, Olga Lazareva*, Zakaria Louadi*, David B. Blumenthal, Jan Baumbach, Markus List. bioRxiv 2022.04.29.490015; doi: https://doi.org/10.1101/2022.04.29.490015. * equal first-authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dysregnet-0.1.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

dysregnet-0.1.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file dysregnet-0.1.0.tar.gz.

File metadata

  • Download URL: dysregnet-0.1.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dysregnet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4971331a16ae6dd64fb5bf23165d5ef519ed4e01a1292ba7a2784840716d10b9
MD5 29cb4d53304575b15577fba070515bf1
BLAKE2b-256 07e2a19105a0e8cb7710fff6a01653fda73985f23a01d71b0c42925a0d8b4340

See more details on using hashes here.

File details

Details for the file dysregnet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dysregnet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dysregnet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12f059af00f5b83a1db672682d2d0914d877e937a55793376756e8e3b7c44903
MD5 3befb1582810fc25a9ebbc18c448d629
BLAKE2b-256 0c5caed9e8b6d4ba7ab2157e60b31ba9aa437783e80c777f1726c0c800375c37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page