Skip to main content

Generalizable Gene Self-Expressive Networks

Project description

GXN: Generalizable Gene Self-Expressive Networks

Description

In this work we introduce Generalizable Gene Self-Expressive Networks, as a new simple, interpretable, and predictive formalism to model gene networks. This package contains two methods, based respectively on ElasticNet and Orthogonal Matching Pursuit regression algorithms, that aim at inferring, assessing and tuning Generalizable Gene Self-Expressive Networks. This package also contains several tutorials that also help to evaluated the generalization capabilities of these new approaches using new internal measure on Three RNAseq datasets from complex eukaryotes, namely C. familiaris, R. norvegicus and H. sapiens.

GXN•OMP

GXN•OMP relies on the well-known Orthogonal Matching Pursuit algorithm that aims at solving a linear regression task subject to a sparsity constrain ensuring that only $d_0$ nonzero coefficients are used. More formally, GXN•OMP aims at solving the following objective function:

$$C_{\star,g}^* = ArgMin_{C_{\star,g}} || X_{\star,g} - X\cdot C_{\star,g} ||^2_2$$

Subject to:

$$|C_{\star,g}|_0 \leq d_0,$$

$$C_{g,g} =0 \quad \forall g \in {1, \dots, N},$$

$$C_{j,g} = 0 \quad \forall j \notin \Psi$$

To solve this task, OMP relies on a greedy forward feature selection method. At each step, the method selects the feature with the highest correlation with the current residual, then it updates the regression coefficients and recomputes the residual using an orthogonal projection on the subspace of the previously selected features. Moreover, an inner cross-validation step is used to select the parameter $d_0$ in a range between 0 and the hyper-parameter $d_0^{max}$ defining the maximal number of features. In practice, hyper-parameter $d_0^{max} = min(\delta \times |\Psi|, rank(X_{\star,\Psi}))$ is set as a fraction $\delta$ of the number of regulators $|\Psi|$ (or as the rank of matrix $X_{\star,\Psi}$, whenever this values is lower). Here we set $d_0^{max}=30$

GXN•EN

GXN•EN relies in the ElasticNet regression technique, that address the linear regression task using simultaneously $\ell_1$ and $\ell_2$ regularization. More formally, GXN•EN address the following objective function:

$C_{\star,g}^* = ArgMin_{C_{\star,g}}$ $\frac{1}{2D} \times || X_{\star,g} - X\cdot C_{\star,g} ||^2_2 + \alpha \rho$ $|| C_{\star,g} ||1$ + $\alpha/2\times(1-\rho)\times$ $|| C{\star,g} ||^2_2$

Subject to:

$$C_{g,g} =0 \quad \forall g \in {1, \dots, N},$$

$$C_{j,g} = 0 \quad \forall j \notin \Psi$$

  • $X$ simply denotes the gene expression matrix, and $D$ the number of samples
  • Internally the method evaluates $\rho \in {0.8,0.9,0.99,1}$
  • $1/\epsilon=K_{\alpha}$ defines the number of $\alpha$ values that should be tested between $\alpha_{max} = \frac{max_{i\neq j} (| X_{\star,i}^\intercal \cdot X_{\star,j}| )}{n\rho}$ (for which the coefficients vector is null) and a value $\alpha_{min} = \epsilon \alpha_{max}$. Notice that 0<$\epsilon$<1).

Installation

pip install GXN

Authors

Sergio Peignier

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GXN-0.0.30.tar.gz (54.5 MB view hashes)

Uploaded Source

Built Distribution

GXN-0.0.30-py3-none-any.whl (55.6 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page