Skip to main content

Python implementation of the Integrative Metabolic Analysis Tool (iMAT) algorithm for context specific metabolic modeling.

Project description

Readme

logo

Documentation Status Tests

About

iMATpy is a python package implementing the iMAT algorithm for integrating gene expression information with a genome scale metabolic model, described in (Shlomi T, et al. Network-based prediction of human tissue-specific metabolism, Nat. Biotechnol., 2008, vol. 26 (pg. 1003-1010)).

Background

The iMAT algorithm takes a genome scale metabolic model (GSM), and gene expression weights, then tries to find a feasible flux distribution which maximizes the sum of reactions with high expression which have a flux above epsilon, and reactions with low expression which have a flux below threshold.

Installation

This package can be installed via pip

pip install imatpy

Solvers

In order to solve the optimization problems associated with iMAT, a LP/MILP solver is required. By default this package uses the GLPK solver which is installed automatically by COBRApy, but Gurobi and CPLEX can also be used. Both Gurobi and CPLEX have free academic licenses available.

Workflow

The general workflow for using this package is as follows:

  1. Normalize gene expression data for sequencing depth and gene length (TPM or RPKM both work well)
  2. Quantile normalize the gene expression data (for example with the qnorm package) (This is optional, but can help make the results less sensitive to parameters)
  3. Combine gene expression values accross samples, for example by taking the mean or median
  4. Convert the gene expression data into qualitative weights, of -1, 0, or 1, where -1 represents low expression, 1 represents high expression, and 0 represents all other genes. This can be done by setting the bottom 15% of genes to -1, the top 15% of genes to 1, and all other genes to 0. Other percentiles can be used, but 15% is a decent starting point.
  5. Using the parse_gpr submodule, specifically the gene_to_rxn_weights function, convert the gene weights into reaction weights using the GPR rules in the model.
  6. Either generate a flux distribution with the iMAT algorithm, or construct a new model with restrictions based on iMAT results.
    • To generate a flux distribution, use the imatpy.imat submodule, specifically the imat function. This function takes a model, and a series of reaction weights, and returns a flux distribution which maximizes the sum of reactions with high expression which have a flux above epsilon, and reactions with low expression which have a flux below threshold.
    • To generate a model with restrictions based on the iMAT results, use the model_creation submodule. It includes a variety of different methods for generating models based on iMAT, with a wrapper function generate_model for convinience.

Parse GPR

The parse_gpr submodule has functions for applying the gene-protien-reaction (GPR) rules in a model to gene weights in order to convert gene weights into reaction weights which can act as input to the iMAT algorithm.

The input to this method is a pandas Series with the index being the gene identifiers, and the values being the gene weights. The output is a pandas Series with the index being the reaction identifiers, and the values being the reaction weights. This method only handles a single series, so the gene expression data must be processed into single weights (with -1 representing lowly expressed genes, 1 representing highly expressed genes, and 0 for all other genes).

Here is an example of how to use this method:

# External imports
import pandas as pd

# iMATpy imports
from imatpy.model_utils import read_model
from imatpy.parse_gpr import gene_to_rxn_weights

# Read in the model
model = read_model("./tests/data/test_model.xml")

# Create a pandas Series representing gene expression weights
model_weights = pd.Series({
            "g_A_imp": 1,
            "g_B_imp": -1,
            "g_C_imp": -1,
            "g_F_exp": 0,
            "g_G_exp": -1,
            "g_H_exp": 0,
            "g_A_B_D_E": 0,
            "g_C_E_F": -1,
            "g_C_H": 0,
            "g_D_G": 1,
        })

# Convert the gene weights into reaction weights
reaction_weights = gene_to_rxn_weights(model=model, gene_weights=model_weights)

# Print the reaction weights
print(reaction_weights)
R_A_e_ex     0.0
R_B_e_ex     0.0
R_C_e_ex     0.0
R_F_e_ex     0.0
R_G_e_ex     0.0
R_H_e_ex     0.0
R_A_imp      1.0
R_B_imp     -1.0
R_C_imp     -1.0
R_F_exp      0.0
R_G_exp     -1.0
R_H_exp      0.0
r_A_B_D_E    0.0
r_C_E_F     -1.0
r_C_H        0.0
r_D_G        1.0
dtype: float64

iMAT Methods

The imat submodule contains functions for running the iMAT algorithm. The main function is imat, which takes a model, and a series of reaction weights, and returns a flux distribution which maximizes the sum of reactions with high expression which have a flux above epsilon, and reactions with low expression which have a flux below threshold.

Here is an example of how to use this method:

# External imports
import pandas as pd

# iMATpy imports
from imatpy.model_utils import read_model
from imatpy.imat import imat

# Read in the model
model = read_model("./tests/data/test_model.xml")

# Read in the reaction weights
rxn_weights = pd.read_csv("./tests/data/test_model_reaction_weights.csv", index_col=0).squeeze()

# Run iMAT
imat_results = imat(model=model, rxn_weights=rxn_weights, epsilon=1, threshold=0.01)

# Print the imat objective
print(f"iMAT Objective: {imat_results.objective_value}")

# Print the imat flux distribution
print(f"iMAT Flux Distribution: \n{imat_results.fluxes}")
iMAT Objective: 3.0
iMAT Flux Distribution:
R_A_e_ex    -1.0
R_B_e_ex    -1.0
R_C_e_ex    -1.0
R_F_e_ex     1.0
R_G_e_ex     1.0
R_H_e_ex    -0.0
R_A_imp      1.0
R_B_imp      1.0
R_C_imp      1.0
R_F_exp      1.0
R_G_exp      1.0
R_H_exp      0.0
r_A_B_D_E    1.0
r_C_E_F      1.0
r_C_H        0.0
r_D_G        1.0
Name: fluxes, dtype: float64

Model Creation

The model_creation submodule contains functions for creating new models based on the results of iMAT. The main function is generate_model, which takes a model, and a series of reaction weights, and returns a new model with restrictions based on the iMAT results.

The available methods for creating a model based on an iMAT flux distribution is:

  • imat_restrictions Adds the binary variables and constraints used in the iMAT algorithm, as well as an additional constraint ensuring that the flux distribution is within tolerance of the optimal iMAT objective value. This method stays closest to the iMAT objective, but the included indicator (binary) variables mean that is unsuitable for sampling.

  • simple_bounds Adds bounds on the reactions found to be “on”, and “off” in iMAT. For all the highly expressed reactions found to be “on”, the flux is constrained to be at least epsilon. For all the lowly expressed reactions found to be “off”, the flux is constrained to be below threshold.

  • subset Removes reactions from the model which are found to be “off”. For all the lowly expressed reactions found to be off, they are constrained to have a flux below threshold.

  • fva Finds bounds using an FVA like approach. A temporary model is created in a simmilar way to the imat_restrictions method above, which includes the imat variables, constraints, and which also constrains the flux distribution to be near optimal for iMAT. The maximum and minimum fluxes allowed through each reaction (while still maintaining the optimal iMAT objective) is found. These values are used as the new reaction bounds. It should be noted, that although the individual upper and lower bounds for the reaction are achievable for each reation while being consistant with the optimal iMAT objective, this doesn’t guarantee that the flux distribution overall is consistant with the optimal iMAT objective.

  • milp Uses a set of mixed integer linear programs to find whether a reaction should be forced off, forward, or reverse. Each reaction in turn is forced to be off, active in the forward direction, and active in the reverse direction, and the iMAT objective is maximized. Whether a reaction should be forced off, or active in either the forward or reverse direction is then determined by which direction maximizes the iMAT objective. Again, it should be noted that this doesn’t guarantee that the iMAT objective is overall maximized by solutions to this model.

Below is an example of how to generate a model with iMAT restrictions:

# External imports
import pandas as pd

# iMATpy imports
from imatpy.model_utils import read_model
from imatpy.model_creation import generate_model

# Read in the model
model = read_model("./tests/data/test_model.xml")

# Read in the reaction weights
rxn_weights = pd.read_csv("./tests/data/test_model_reaction_weights.csv", index_col=0).squeeze()

# Generate a model with iMAT restrictions
imat_model = generate_model(model=model, rxn_weights=rxn_weights, method="fva", epsilon=1, threshold=0.01)

# This model can be used for sampling, finding optimal biomass generation, finding essential genes, etc.
optimal_biomass = imat_model.slim_optimize()

print(f"Optimal Biomass: {optimal_biomass}")
Optimal Biomass: 50.0

Model Utils

The model_utils submodule contains several utility functions for working with COBRApy models. Specifically, it contains functions for: - Reading and writing models in various formats with a single function, specifically read_model/write_model. - Determining if two models are equivalent, using model_eq.

Here is an example of how to use the model IO methods:

# iMATpy imports
from imatpy.model_utils import read_model, write_model, model_eq

# You can read in a model from a file
model = read_model("./tests/data/test_model.xml") # in SBML
model = read_model("./tests/data/test_model.json") # in JSON
model = read_model("./tests/data/test_model.yml") # in YAML
model = read_model("./tests/data/test_model.mat") # in Matlab

# You can also write a model to a file
write_model(model, "./tests/data/test_model.xml") # in SBML
write_model(model, "./tests/data/test_model.json") # in JSON
write_model(model, "./tests/data/test_model.yml") # in YAML
write_model(model, "./tests/data/test_model.mat") # in Matlab

Here is an example of using the model comparison method:

# iMATpy imports
from imatpy.model_utils import read_model, model_eq

# Read a model
model = read_model("./tests/data/test_model.xml")

# Create a copy of the model
model_copy = model.copy()

# Check that the models are equivalent
print(f"Models are equivalent: {model_eq(model, model_copy)}")

# Change the copy model
model_copy.reactions.get_by_id("r_A_B_D_E").lower_bound = -314

# Check that the models are no longer equivalent
print(f"Models are equivalent: {model_eq(model, model_copy)}")
Models are equivalent: True
Models are equivalent: False

License

This package itself is relased under an MIT license. It makes use of several libraries which are listed below:

  • COBRApy: Released under the LGPL-2.0-or-later license see here
  • Numpy: Released under the BSD-3-Clause license see here
  • Optlang: Released under the Apache-2.0 lisence see here
  • Pandas: Released under the BSD-3-Clause license see here
  • Scipy: Released under the BSD-3-Clause license see here
  • Sympy: Released under the BSD-3-Clause license see here

In addition, these libraries were used during developement:

  • black: Used for autoformating
  • jupyter: Used for rendering quarto documents - flake8: Used for linting - pytest: Used for running unittests
  • Sphinx: Used for generating documentation
  • sphinx-rtd-theme: Used for formating the documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imatpy-0.2.0.tar.gz (237.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imatpy-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file imatpy-0.2.0.tar.gz.

File metadata

  • Download URL: imatpy-0.2.0.tar.gz
  • Upload date:
  • Size: 237.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for imatpy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4c89357fc12ca2e50f90608ed652f5b96da784dc32a0b04e4c8b1dc5ea5f7909
MD5 221f9f1c5c43676534fde5cd7583cbd5
BLAKE2b-256 bcccb62ca71594833c902f837af4a6a01ee8d8690ffe94a12b55244dbeaeafab

See more details on using hashes here.

Provenance

The following attestation bundles were made for imatpy-0.2.0.tar.gz:

Publisher: build.yml on Braden-Griebel/imatpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imatpy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: imatpy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for imatpy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03ade2e02b07692a1bd095cb855980afb2004aef84c30488c8e67ff813b0ea77
MD5 e02023427e51fae891c422d75a41fdb7
BLAKE2b-256 df4104630762c042cc30ebe38f636755359f30e0f60ed964b882cad0c0dd8ed2

See more details on using hashes here.

Provenance

The following attestation bundles were made for imatpy-0.2.0-py3-none-any.whl:

Publisher: build.yml on Braden-Griebel/imatpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page