Skip to main content

Ab initio pathway inference

Project description

https://img.shields.io/pypi/v/pathmodel.svg https://travis-ci.org/pathmodel/pathmodel.svg?branch=master https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg https://img.shields.io/docker/cloud/build/pathmodel/pathmodel

PathModel: metabolic pathway drift prototype

PathModel is a prototype to infer new biochemical reactions and new metabolite structures. The biological motivation for developing it is described in this preprint , now in revision at iScience.

There is no guarantee that this script will work, it is a Work In Progress in early state.

Description

Metabolic Pathway Drift Hypothesis

Metabolic Pathway Drift hypothesizes that metabolic pathways can be conserved even if their biochemical reactions undergo variations. These variations can be non-orthologous displacement of genes or changes in enzyme order.

images/metabolic_pathway_drift.jpg

Metabolic Pathway Drift Hypothesis

To test this hypothesis, we develop PathModel to infer possible enzyme order changes in metabolic pathways.

Program

PathModel is developed in ASP using the clingo grounder and solver. It is divided in three ASP scripts.

The first one, ReactionSiteExtraction.lp creates biochemical transformation from reactions. The biochemical transformation of a reaction corresponds to the atoms and bonds changes between the reactant and the product of the reaction.

When a reaction occurred between two molecules, the script will compare atoms and bonds of the two molecules of the reaction and will extract a reaction site before the reaction (composed of atoms and bonds that are present in the reactant but absent in the product) and a reaction site after the reaction (composed of atoms and bonds present in the product but absent in the reactant).

ReactionSiteExtraction produces two sites for each reaction (one before and one after the reaction). This corresponds to the biochemical transformation induced by the reaction.

A second script, MZComputation.lp will compute the MZ for each known molecule. It also computes the MZ changes between the reactant and the product of a reaction.

These data will be used by the third script: PathModel.lp.

PathModel uses the incremental mode from Clingo. Using a source molecule, it will apply two inference methods until it reaches a goal (another molecules).

Installation

Requirements

PathModel is a Python3 package using Answer Set Programming (ASP) to infer new biochemical reactions and new metabolites structures. It is divided in two parts:

PathModel requires:

Using Singularity and Singularity Hub

You can use the container from Singularity Hub.

# Choose your preference to pull the container from Singularity Hub (once)
singularity pull shub://pathmodel/pathmodel-singularity

# Enter it
singularity run pathmodel-singularity_latest.sif.sif
pathmodel test -o output_folder
pathmodel_plot -i output_folder/MAA
pathmodel_plot -i output_folder/sterol

# Or use as a command line
singularity exec pathmodel-singularity_latest.sif.sif pathmodel test -o output_folder
singularity exec pathmodel-singularity_latest.sif.sif pathmodel_plot -i output_folder/MAA
singularity exec pathmodel-singularity_latest.sif.sif pathmodel_plot -i output_folder/sterol

This container is buildfrom this Singularity recipe. If you prefer, you can use this recipe:

singularity build pathmodel.sif Singularity

Using docker

A docker image of pathmodel is available at dockerhub. This image is based on the pathmodel Dockerfile.

docker run -ti -v /path/shared/container:/shared --name="mycontainer" pathmodel/pathmodel bash

This command will download the image and create a container with a shared path. It will launch a bash terminal where you can use the command pathmodel (see Command and Python call and Tutorial).

Using git

The package can be installed either using python setup or pip install (see below)

git clone https://github.com/pathmodel/pathmodel.git

cd PathModel

python setup.py install

Using pip

If you have all the dependencies on your system, you can just download Pathmodel using pip.

pip install pathmodel

Using conda environment (to install all dependencies)

Due to all the dependencies required by all the script of Pathmodel, we create a conda environment file that contains all dependencies.

First you need Conda. To avoid conflict between the conda python and your system python, you could use a conda environment and Miniconda.

If you want to test this, the first thing is to install miniconda:

# Download Miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Give the permission to the installer.
chmod +x Miniconda3-latest-Linux-x86_64.sh

# Install it at the path that you choose.
./Miniconda3-latest-Linux-x86_64.sh -p /path/where/miniconda/will/be/installed/ -b

# Delete installer.
rm Miniconda3-latest-Linux-x86_64.sh

# Add conda path to you bash settings.
echo '. /path/where/miniconda/is/installed/etc/profile.d/conda.sh' >> ~/.bashrc
# Will activate the environment.
# For more information: https://github.com/conda/conda/blob/master/CHANGELOG.md#440-2017-12-20
echo 'conda activate base' >> ~/.bashrc

After this you need to restart your terminal or use: source ~/.bashrc

Then you will get our conda environment file:

# Download our conda environment file from Pathmodel github page.
wget https://raw.githubusercontent.com/pathmodel/pathmodel/master/conda/pathmodel_env.yaml

# Use the file to create the environment and install all dependencies.
conda env create -f pathmodel.yaml

If no error occurs, you can now access a conda environment with pathmodel:

# Activate the environment.
conda activate pathmodel

# Launch the help of Pathmodel.
(pathmodel) pathmodel -h

You can exit the environment with:

# Deactivate the environment.
conda deactivate

Input

Molecules are modelled with atoms (hydrogen excluded) and bonds (single and double).

atom("Molecule1",1,carb). atom("Molecule1",2,carb).
bond("Molecule1",single,1,2).

atom("Molecule2",1,carb). atom("Molecule2",2,carb). atom("Molecule2",3,carb).
bond("Molecule2",single,1,2). bond("Molecule2",single,2,3).

Reactions between molecules are represented as link between two molecules with a name:

reaction(reaction1,"Molecule1","Molecule2").

A common domain is needed to find which molecules share structure with others:

atomDomain(commonDomainName,1,carb). atomDomain(commonDomainName,2,carb).
bondDomain(commonDomainName,single,1,2).

A molecule source is defined:

source("Molecule1").

Initiation and goal of the incremental grounding must be defined:

init(pathway("Molecule1","Molecule2")).
goal(pathway("Molecule1","Molecule3")).

M/Z ratio can be added to check whether there is a metabolite that can be predict with this ratio. M/Z ratio must be multiplied by 10 000 because Clingo doesn’t use decimals.

mzfiltering(2702720).

Molecules that are not in the organism of study can be added. They will not be targeted of the inference methods.

absentmolecules("Molecule1").

Command and Python call

Command-line:

pathmodel infer -i data.lp -o output_folder
pathmodel_plot -i output_folder_from_pathmodel

In python (pathmodel_plot is not available in import call):

import pathmodel

pathmodel.pathmodel_analysis('data.lp', output_folder)

Output

With the infer command, pathmodel will use the data file and try to create an output folder:

output_folder
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp

data_pathmodel.lp contains intermediary files for PathModel. Specifically, it contains the input data and the results of ReactionSiteExtraction.lp (diffAtomBeforeReaction, diffAtomAfterReaction, diffBondBeforeReaction, diffBondAfterReaction, siteBeforeReaction, siteAfterReaction) and of MZComputation.lp (domain, moleculeComposition, moleculeNbAtoms, numberTotalBonds, moleculeMZ, reactionMZ). The python wrapper gives this file to PathModel.lp as input.

pathmodel_data_transformations.tsv contains all the transformation inferred from the input data and the ReactionSiteExtraction.lp script.

pathmodel_incremental_inference.tsv shows the step of the incremental mode of clingo when a new reaction has been inferred using a known transformation.

pathmodel_output.lp is the output lp file of PathModel.lp.

Then if you use the pathmodel_plot command on the output_folder, pathmodel will create the following structure:

output_folder
├── ...
├── molecules
        ├── Molecule1
        ├── Molecule2
        ├── ...
├── newmolecules_from_mz
        ├── Prediction_...
        ├── Prediction_...
        ├── ...
├── pathmodel_output.svg

molecules contains the structures of each molecules in the input data file.

newmolecules_from_mz contains the structures of inferred molecules using the MZ. It will be empty if no MZ were given or if no molecules were inferred.

pathmodel_output.svg shows the pathway containing the molecules and the reactions (in green) from the input files and the newly inferred molecules and reactions (in blue).

Tutorial

For this tutorial, we have created fictitious data available at test/pathmodel_test_data.lp.

In this file there is 5 molecules:

images/molecule_1.svg

atom(“molecule_1”,1..4,carb). bond(“molecule_1”,single,1,2). bond(“molecule_1”,single,1,3). bond(“molecule_1”,single,2,3). bond(“molecule_1”,single,2,4).

images/molecule_2.svg

atom(“molecule_2”,1..4,carb). bond(“molecule_2”,single,1,2). bond(“molecule_2”,single,1,3). bond(“molecule_2”,single,2,3). bond(“molecule_2”,double,2,4).

images/molecule_3.svg

atom(“molecule_3”,1..6,carb). bond(“molecule_3”,single,1,2). bond(“molecule_3”,single,1,3). bond(“molecule_3”,single,1,6). bond(“molecule_3”,single,2,3). bond(“molecule_3”,single,2,4). bond(“molecule_3”,single,3,6). bond(“molecule_3”,single,5,6).

images/molecule_4.svg

atom(“molecule_4”,1..6,carb). bond(“molecule_4”,single,1,2). bond(“molecule_4”,single,1,3). bond(“molecule_4”,single,1,6). bond(“molecule_4”,single,2,3). bond(“molecule_4”,double,2,4). bond(“molecule_4”,single,3,6). bond(“molecule_4”,single,5,6).

images/molecule_5.svg

atom(“molecule_5”,1..7,carb). bond(“molecule_5”,single,1,2). bond(“molecule_5”,single,1,3). bond(“molecule_5”,single,1,6). bond(“molecule_5”,single,1,7). bond(“molecule_5”,single,2,3). bond(“molecule_5”,single,2,4). bond(“molecule_5”,double,3,6). bond(“molecule_5”,single,5,6).

One reaction:

images/reduction_reaction.svg

reaction(reduction, “molecule_1”, “molecule_2”).

One known MZ:

92,1341 (so 921341 for Clingo)

mzfiltering(921341).

By calling the command:

pathmodel infer -i pathmodel_test_data.lp -o output_folder

Pathmodel will create output files:

output_folder
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp

As explained in Output, data_pathmodel.lp is an intermediary file for Pathmodel.

pathmodel_data_transformations.tsv contains the transformation inferred from the knonw reactions, here:

reaction_id

reactant_substructure

product_substructure

reduction

[(‘single’, ‘2’, ‘4’)]

[(‘double’, ‘2’, ‘4’)]

This means that the reduction transforms a single bond between atoms 2 and 4 into a double bond. These transformations are used by the deductive and analogical reasoning of PathModel.

pathmodel_incremental_inference.tsv shows the new reactions inferred by PathModel and the step in Clingo incremental mode when the new reaction has been inferred.

infer_turn

new_reaction

reactant

product

2

reduction

“molecule_3”

“molecule_4”

2

reduction

“molecule_5”

“Prediction_921341_reduction”

Two new reduction variant reactions have been inferred at step two of incremenetal mode:

  • one between Molecule3 and Molecule4 inferred from the reduction between Molecule1 and Molecule2. This is a demonstration of the deductive reasoning of PathModel:

images/deductive_reasoning.svg
  • one between Molecule5 and a newly inferred metabolite with the MZ of 92,1341. To find this, PathModel computes the MZ of Molecule5 (94,1489). Then it applies each transformations from its knowledge database (here reduction) to each molecules from the knowledge database. With this, PathModel computes the MZ of hypothetical molecules and compared them to the MZ given by the user (here 92,1341). And if a match is found then the reaction and the molecule are inferred. This is an example of the analogical reasoning:

images/analogical_reasoning.svg

Then it is possible to have access to graphic representations of molecules and reactions:

pathmodel_plot -i output_folder
output_folder
├── ...
├── molecules
        ├── molecule_1.svg
        ├── molecule_2.svg
        ├── molecule_3.svg
        ├── molecule_4.svg
        ├── molecule_5.svg
├── newmolecules_from_mz
        ├── Prediction_921341_reduction.svg
├── pathmodel_output.svg

There is a structure inferred by PathModel for the MZ 92.1341:

images/Prediction_921341_reduction.svg

PathModel creates also a picture showing all the reactions (known reactions in green, inferred reaction variant in blue):

images/pathmodel_output.svg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathmodel-0.1.8.tar.gz (31.7 kB view details)

Uploaded Source

File details

Details for the file pathmodel-0.1.8.tar.gz.

File metadata

  • Download URL: pathmodel-0.1.8.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.19.5 CPython/3.6.8

File hashes

Hashes for pathmodel-0.1.8.tar.gz
Algorithm Hash digest
SHA256 d44d3bf97300b2573b7d938a833e119b0b01381f686f111c4947087329aab18f
MD5 7f596e7de758d89ab9561f0939289eae
BLAKE2b-256 3ac095498fd5c1642fa74e47cf19117f2b0718e49f58bc0607d6d306a6ba09fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page