BioCatalyzer: a rule-based tool to predict compound metabolism

These details have not been verified by PyPI

Project links

Project description

BioCatalyzer

BioCatalyzer is a python tool that predicts enzymatic metabolism products using a rule-based approach.

BioCatalyzer is implemented as a Command Line Interface that takes as input a set of compounds represented as SMILES strings and outputs a set of predicted metabolic products and associated enzymes.

This metabolic products can then be matched with experimental MS data using this same tool.

Installation

Installing from Pypi package repository:

pip install biocatalyzer

Installing from GitHub:

clone the repository: git clone https://github.com/jcorreia11/BioCatalyzer.git
run: python setup.py install

Command Line Interface

biocatalyzer_cli <PATH_TO_COMPOUNDS> <OUTPUT_DIRECTORY> [--neutralize=<BOOL>] [--reaction_rules=<FILE_PATH>] [--organisms=<FILE_PATH>] [--patterns_to_remove=<FILE_PATH>] [--molecules_to_remove=<FILE_PATH>] [--min_atom_count=<INT>] [--match_ms_data=<BOOL>] [--ms_data_path=<FILE_PATH>] [--tolerance=<FLOAT>] [--n_jobs=<INT>]

Argument	Example	Description	Default
compounds <PATH_TO_COMPOUNDS>	`file.tsv` or `"smile1;smiles2;smile3;etc"`	The path to the file containing the compounds to use as reactants. Or ;-separated SMILES strings.¹
output_directory <OUTPUT_DIRECTORY>	`output/directory/`	The path directory to save the results to.
neutralize	`True` or `False`	Whether to neutralize the compounds before predicting the products. In this case the new products will also be neutralized.	`False`
reaction_rules	`file.tsv` or `None`	The path to the file containing the reaction rules to use.²	all_reaction_rules_forward_no_smarts_duplicates_sample.tsv
organisms	`file.tsv` or `"org_id1;org_id2;org_id3;etc"` or `None`	The path to the file containing the organisms to use. Or ;-separated organisms identifiers. Reaction Rules will be selected accordingly (select only rules associated with enzymes encoded by genes from these organisms).³	All reaction rules are used.
patterns_to_remove	`patterns.tsv` or `None`	The path to the file containing the patterns to remove from the products. ⁴	patterns.tsv
molecules_to_remove	`molecules.tsv` or `None`	The path to the file containing the molecules to remove from the products. ⁵	byproducts.tsv
min_atom_count	`4`	The minimum number of heavy atoms a product must have.	`5`
match_ms_data	`True` or `False`	Whether to match the predicted products to the MS data.	`False`
ms_data_path	`ms_data.tsv`	The path to the file containing the MS data. ⁶	`None`
tolerance	`0.02`	The mass tolerance to use when matching masses.	`0.02`
n_jobs	`6`	The number of jobs to run in parallel (-1 uses all).	`1`

Compounds

See drugs.csv¹ for an example.

The file must be tab-separated and contain the following columns:

smiles - the SMILES representation of the compounds;
compound_id - the compounds identifiers.

Alternatively, the compounds can be passed as ;-separated string with the SMILES representations.

Output directory

The output path must be a directory. The results will be saved in the following files:

new_compouds.tsv - the predicted products;
matches.tsv (if match_ms_data is set to True) - the matches between the predicted products and the MS data;

Neutralize

If set to True, the compounds will be neutralized before predicting the products. In this case the new products will also be neutralized.

Reaction Rules

See all_reaction_rules_forward_no_smarts_duplicates_sample.tsv² for an example.

The file must be tab-separated and contain the following columns:

InternalID - The ID of the Reaction Rule. # TODO: change the name of this column
Reactants - The Reactants of the ReactionRule. Coreactants must be defined by their ID as in the Coreactants file. The compound to match must be identified by the string 'Any'. The format must be: coreactant1_id;Any;coreactant_id. The order in which the reactants and the compound to match are defined is relevant and specific to the Reaction Rule. If the Reaction Rules are mono-component (i.e. they do not contain any additional coreactant) the format must be: Any.
SMARTS - The SMARTS representation of the Reaction Rule.
EC_Numbers - The EC Numbers associated with the Reaction Rule.
Organisms - The Organisms associated with the Reaction Rule.

By default our set of reaction rules is used.

Organisms

All organisms' identifiers are defined in: https://www.genome.jp/kegg/catalog/org_list.html are allowed.

Example:

hsa is for Homo sapiens (human).

eco is for Escherichia coli K-12 MG1655.

sce is for Saccharomyces cerevisiae (budding yeast).

If you want to use your own organisms see organisms.csv³ for an example.

The file must be tab-separated and contain a column named org_id with the organisms' identifiers (KEGG identifiers).

Alternatively, the organisms can be passed as ;-separated string with the organisms identifiers.

Patterns to remove

If you want to use your own patterns to remove see patterns.tsv⁴ for an example.

The file must be tab-separated and contain a column named smarts with the SMARTS representation of the patterns to remove.

Molecules to remove

If you want to use your own molecules to remove see byproducts.tsv⁵ for an example.

The file must be tab-separated and contain a column named smiles with the SMILES representation of the molecules to remove.

Match MS data

If set to True, the predicted products will be matched to the MS data.

In this case the ms_data_path must be set.

MS data path

See ms_data.tsv⁶ for an example.

The file must be tab-separated and contain the following columns:

ParentCompound - the parent/original compound identifiers.
ParentCompoundSmiles - the SMILES representation of the compounds (optional).
Mass - the mass of the molecule.

Mass Tolerance

The mass tolerance (float) to use when matching masses. Masses between mass - mass_tolerance and mass + mass_tolerance will be considered as a match.

Number of jobs

The number of jobs to run in parallel. If -1 is passed, all available cores will be used.

Usage example

biocatalyzer_cli file.tsv output_dir/ --neutralize=True --reaction_rules=reaction_rules.tsv --organisms="hsa;eco;sce" --patterns_to_remove=patterns.tsv --molecules_to_remove=byproducts.tsv --match_ms_data=True --ms_data_path=ms_data.tsv --mass_tolerance=0.1 --n_jobs=-1

For predicting compound metabolism only:

biocatalyzer_cli file.tsv output_dir/ --neutralize=True --reaction_rules=reaction_rules.tsv --organisms="hsa;eco;sce" --patterns_to_remove=patterns.tsv --molecules_to_remove=byproducts.tsv --n_jobs=-1

Individual CLIs

Both parts of this CLI (the generation of new compounds (bioreactor_cli) and the matching with the MS data (matcher_cli)) can be run individually.

For the bioreactor_cli see readme_bioreactor_cli.md.

For the matcher_cli see readme_matcher_cli.md.

Cite

Manuscript under preparation!

Credits and License

Developed at Centre of Biological Engineering, University of Minho and EMBL Heidelberg (Zimmermann-Kogadeeva Group).

This project has received funding from the Portuguese FCT and EMBL CPP Scientific Visitors Fellowships.

Released under an MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2b0 pre-release

Feb 12, 2025

0.1.1b0 pre-release

Oct 27, 2022

0.0.4b0 pre-release

Oct 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biocatalyzer-0.1.2b0.tar.gz (11.1 MB view details)

Uploaded Feb 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

biocatalyzer-0.1.2b0-py3-none-any.whl (11.1 MB view details)

Uploaded Feb 12, 2025 Python 3

File details

Details for the file biocatalyzer-0.1.2b0.tar.gz.

File metadata

Download URL: biocatalyzer-0.1.2b0.tar.gz
Upload date: Feb 12, 2025
Size: 11.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biocatalyzer-0.1.2b0.tar.gz
Algorithm	Hash digest
SHA256	`8e1aba92270dedcd6b6161aa180c2728dd240758ca731b5f3fddc2a31c8b035c`
MD5	`687653b71fb13eee6f13e3ff60527317`
BLAKE2b-256	`14b44a2240b9d8f607c7ac3202cc7f33f8f8ccc55b50cb3bce7b8223e0b67885`

See more details on using hashes here.

Provenance

The following attestation bundles were made for biocatalyzer-0.1.2b0.tar.gz:

Publisher: publish.yml on BioSystemsUM/BioCatalyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: biocatalyzer-0.1.2b0.tar.gz
- Subject digest: 8e1aba92270dedcd6b6161aa180c2728dd240758ca731b5f3fddc2a31c8b035c
- Sigstore transparency entry: 170615751
- Sigstore integration time: Feb 12, 2025
Source repository:
- Permalink: BioSystemsUM/BioCatalyzer@75891f4b51452d1355afe90ee1871fd99fa4b109
- Branch / Tag: refs/heads/main
- Owner: https://github.com/BioSystemsUM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@75891f4b51452d1355afe90ee1871fd99fa4b109
- Trigger Event: workflow_dispatch

File details

Details for the file biocatalyzer-0.1.2b0-py3-none-any.whl.

File metadata

Download URL: biocatalyzer-0.1.2b0-py3-none-any.whl
Upload date: Feb 12, 2025
Size: 11.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biocatalyzer-0.1.2b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f38bf2e37c4bd7dac7656482ad9dec82071f5947a040b524e13027cd3da6831`
MD5	`41d7b3f43bf284c2b127b6052f0b859e`
BLAKE2b-256	`edc1302883d7e8310ce1846d3fd16d25e81a2959c8dbf8c6a825bcde07e261c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for biocatalyzer-0.1.2b0-py3-none-any.whl:

Publisher: publish.yml on BioSystemsUM/BioCatalyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: biocatalyzer-0.1.2b0-py3-none-any.whl
- Subject digest: 8f38bf2e37c4bd7dac7656482ad9dec82071f5947a040b524e13027cd3da6831
- Sigstore transparency entry: 170615752
- Sigstore integration time: Feb 12, 2025
Source repository:
- Permalink: BioSystemsUM/BioCatalyzer@75891f4b51452d1355afe90ee1871fd99fa4b109
- Branch / Tag: refs/heads/main
- Owner: https://github.com/BioSystemsUM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@75891f4b51452d1355afe90ee1871fd99fa4b109
- Trigger Event: workflow_dispatch

biocatalyzer 0.1.2b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BioCatalyzer

Installation

Command Line Interface

Compounds

Output directory

Neutralize

Reaction Rules

Organisms

Patterns to remove

Molecules to remove

Match MS data

MS data path

Mass Tolerance

Number of jobs

Usage example

Individual CLIs

Cite

Credits and License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance