DEXOM implementation in python using cobrapy
Project description
DEXOM in python
This is a python implementation of DEXOM (Diversity-based enumeration of optimal context-specific metabolic networks)
The original project, which was developped in MATLAB, can be found here: https://github.com/MetExplore/dexom
Parts of the imat code were taken from the driven package for data-driven constraint-based analysis: https://github.com/opencobra/driven
API documentation is available here: https://dexom-python.readthedocs.io/en/latest/
Requirements
- Python 3.7+
- CPLEX 12.10+
Installing CPLEX
Free license (Trial version): this version is limited to 1000 variables and 1000 constraints, and is therefore not useable on larger models
Academic license: for this, you must sign up using an academic email address.
- after logging in, you can access the download for "ILOG CPLEX Optimization Studio"
- download version 12.10 or higher of the appropriate installer for your operating system
- install the solver
- update the PYTHONPATH environment variable by adding the directory containing the
setup.py
file appropriate for your OS and python version
alternatively, runpython "C:\Program Files\IBM\ILOG\CPLEX_Studio1210\python\setup.py" install
Functions
These are the different functions which are available for context-specific metabolic subnetwork extraction
apply_gpr
The gpr_rules.py
script can be used to transform gene expression data into reaction weights, for a limited selection of models.
It uses the gene identifiers and gene-protein-reaction rules present in the model to connect the genes and reactions.
By default, continuous gene expression values/weights will be transformed into continuous reaction weights.
Using the --convert
flag will instead create semi-quantitative reaction weights with values in {-1, 0, 1}. By default, the proportion of these three weights will be {25%, 50%, 25%}.
iMAT
imat.py
contains a modified version of the iMAT algorithm as defined by (Shlomi et al. 2008).
The main inputs of this algorithm are a model file, which must be supplied in a cobrapy-compatible format (SBML, JSON or MAT), and a reaction_weight file in which each reaction is attributed a score.
These reaction weights must be determined prior to launching imat, using the GPR rules present in the metabolic model.
The remaining inputs of imat are:
epsilon
: the activation threshold of reactions with weight > 0threshold
: the activation threshold for unweighted reactionstimelimit
: the solver time limitfeasibility
: the solver feasbility tolerancemipgaptol
: the solver MIP gap tolerancefull
: a bool parameter for switching between the partial & full-DEXOM implementation
note: the feasibility determines the solver's capacity to return correct results. In particular, the relation epsilon
> threshold
> ub*feasibility
is required (where ub
is the maximal upper bound for reaction flux in the model)
By default, uses the create_new_partial_variables function. In this version, binary flux indicator variables are created for each reaction with a non-zero weight.
In the full-DEXOM implementation, binary flux indicator variables are created for every reaction in the model. This does not change the result of the imat function, but can be used for some of the enumeration methods below.
enum_functions
Four methods for enumerating context-specific networks are available:
rxn-enum.py
contains reaction-enumerationicut.py
contains integer-cutmaxdist.py
contains distance-maximizationdiv-enum.py
contains diversity-enumeration
An explanation of these methods can be found in (Rodriguez-Mier et al. 2021).
Each of these methods can be used on its own. The same model and reaction_weights inputs must be provided as for the imat function.
Additional parameters for all 4 methods are:
prev_sol
: a starting imat solution (if none is provided, a new one will be computed)obj_tol
: a relative tolerance on the imat objective value for the optimality of the solutions
icut, maxiter, and div-enum also have two additional parameters:maxiter
: the maximum number of iterations to runfull
: set to True to use the full-DEXOM implementation
As previously explained, the full-DEXOM implementation defines binary indicator variables for all reactions in the model. Although only the reactions with non-zero weights have an impact on the imat objective function, the distance maximization function which is used in maxdist and div-enum can utilize the binary indicators for all reactions. This increases the distance between the solutions, but requires significantly more computation time.
maxdist and div-enum also have one additional parameter:icut
: if True, an icut constraint will be applied to prevent duplicate solutions
Parallelized DEXOM
The DEXOM algorithm is a combination of several network enumeration methods.
enumeration.py
contains the write_batch_script1
function, which is used for creating a parallelization of DEXOM on a slurm computation cluster.
The main inputs of this function are:
filenums
: the number of parallel batches which should be launched on slurmiters
: the number of div-enum iterations per batch
Other inputs are used for personalizing the directories and filenames on the cluster.
After executing the script, the target directory should contain several bash files named file_0.sh
, file_1.sh
etc. depending on the filenum
parameter that was provided.
In addition, there should be one runfiles.sh
file. This file contains the commands to submit the other files as job batches on the slurm cluster.
The results of a DEXOM run can be evaluated with the following scripts:
dexom_cluster_results.py
compiles and removes duplicate solutions from the results of a parallel DEXOM run.pathway_enrichment.py
can be used to perform a pathway enrichment analysis using a one-sided hypergeometric testresult_functions.py
contains theplot_pca
function, which performs Principal Component Analysis on the enumeration solutions
Examples
Toy models
The toy_models.py
script contains code for generating some small metabolic models and reaction weights.
The toy_models folder contains some ready-to-use models and reaction weight files.
The main.py
script contains a simple example of the DEXOM workflow using one of the toy models.
Recon 2.2
The example_data folder contains the model and the differential gene expression data which was used to test this new implementation.
In order to produce reaction weights, you can call the gpr_rules
script from the command line.
This will create a file named "pval_0-01_reactionweights.csv" in the recon2v2 folder:
python dexom_python/gpr_rules -m recon2v2/recon2v2_corrected.json -g recon2v2/pval_0-01_geneweights.csv -o recon2v2/pval_0-01_reactionweights
Then, call imat to produce a first context-specific subnetwork. This will create a file named "imat_solution.csv" in the recon2v2 folder:
python dexom_python/imat -m recon2v2/recon2v2_corrected.json -r recon2v2/pval_0-01_reactionweights.csv -o recon2v2/imat_solution
To run DEXOM on a slurm cluster, call the enumeration.py script to create the necessary batch files (here: 100 batches with 100 iterations).
Be careful to put the path to your installation of the CPLEX solver as the -c
argument.
This script assumes that you have cloned the dexom-python
project on the cluster, which contains the dexom_python
folder and the recon2v2
folder in the same directory.
Note that this step creates a file called "recon2v2_reactions_shuffled.csv", which shows the order in which rxn-enum will call the reactions from the model.
python dexom_python/enum_functions/enumeration -m recon2v2/recon2v2_corrected.json -r recon2v2/pval_0-01_reactionweights.csv -p recon2v2/imat_solution.csv -o recon2v2/ -n 100 -i 100 -c /home/mstingl/save/CPLEX_Studio1210/cplex/python/3.7/x86-64_linux
Then, submit the job to the slurm cluster.
Note that if you created the files on a Windows pc, you must use the command dos2unix runfiles.sh
before sbatch runfiles.sh
:
cd example_data/
sbatch runfiles.sh
cd ..
After all jobs are completed, you can analyze the results using the following scripts:
python dexom_python/dexom_cluster_results -i recon2v2/ -o recon2v2/ -n 100
python dexom_python/pathway_enrichment -s recon2v2/all_dexom_sols.csv -m recon2v2/recon2v2_corrected.json -o recon2v2/
python dexom_python/result_functions -s recon2v2/all_dexom_sols.csv -o recon2v2/
The file all_dexom_sols.csv
contains all unique solutions enumerated with DEXOM.
The file output.txt
contains the average computation time per iteration and the proportion of duplicate solutions.
The .png
files contain boxplots of the pathway enrichment tests as well as a 2D PCA plot of the binary solution vectors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dexom_python-0.4.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9e84ff95f047e410c2f863b2d8fe717f5cfb22c495308d43db443a7ddc5ce5f |
|
MD5 | 83a4878c1907833644ae9bb8cf4c912f |
|
BLAKE2b-256 | 34240a044a60afe47b6ff48e1de26f89137276099bd4fd843b202b92dad708f7 |