Build metabolic graphs from online databases.
Project description
Synopsis
Metabograph is a Python library for generating metabolic networks from BioPAX data. The generated NetworkX graphs can be used in various application such as training graph neural networks (GNNs) or saved to files for use with viewers such as Gephi.
Links
GitLab
Other Repositories
Usage
Configuration File
Metabograph uses a YAML configuration file to set the target species, configure eventual BioPAX SPARQL endpoints and filter graphs by cellular pathways and locations. A default configuration file with comments can be generated with the command metabograph --create-config config.yaml
:
# ConfigData
# BioPAX configuration.
biopax:
# A list of paths to custom OWL files in the BioPAX level-3 format to use, either with or without the default files
# depending on the value of include_default_owl_files.
# Type: list[Path] [OPTIONAL]
custom_owl_files: []
# An optional URL to a SPARQL endpoint through which to query BioPAX data, such as a local Fuseki server.
# Type: str [OPTIONAL]
endpoint: null
# If True, include complexes and their components.
# Type: bool [OPTIONAL]
include_complexes: true
# If True, include the default BioPAX files for the configured species. These files will be downloaded if necessary.
# Type: bool [OPTIONAL]
include_default_owl_files: true
# If True, include member entities (as defined by BioPAX).
# Type: bool [OPTIONAL]
include_member_entities: false
# If True, items for which a pathway or location is unknown will be kept when filtering by pathway and/or location.
# Type: bool [OPTIONAL]
keep_unknown: false
# Either a list of BioPax entity locations, or a path to a plaintext file with one location per line. See `metabograph
# --list-locations` for the complete list.
# Type: Union[list[str], str] [OPTIONAL]
locations: null
# Either a list of BioPAX pathways, or a path to a plaintext file with one pathways per line. See `metabograph --list-
# pathways` for the complete list.
# Type: Union[list[str], str] [OPTIONAL]
pathways: null
# The target species. It must be one supported by BioPAX. See `metabograph --list-species` for the complete list.
# Type: str [OPTIONAL]
species: homo sapiens
# Cache configuration.
cache:
# The path to a cache directory. If unset, the standard XDG user cache directory will be used.
# Type: Path [OPTIONAL]
path: null
# The timeout for the cached data. Data will be cleared from the cache after this timeout. If unset, cached data will
# not automatically time out.
# Type: int [OPTIONAL]
timeout: null
Metabograph Command-Line Tool
The metabograph
command-line tool can be used to query the available species, pathways and cellular localizations and generate graph files in GML format.
usage: metabograph [-h] [--clear-cache] [--create-config PATH]
[--list-species] [--list-locations] [--list-pathways] [-v]
[config] [graph]
Generate graphs from Reactome BioPAX data.
positional arguments:
config Path to the YAML configuration file.
graph Output path for the generated graph in GML format.
options:
-h, --help show this help message and exit
--clear-cache Clear the cache to force a refresh of query data.
--create-config PATH Create a YAML configuration file at the given path. If
the path is "-", the generated YAML will be printed to
STDOUT.
--list-species List available species.
--list-locations List recognized cellular locations.
--list-pathways List recognized pathways.
-v, --verbose Increasing logging level to DEBUG. Pass twice to also
show SPARQL queries.
Python API
For the full API, see the API documentation linked above. The following is an example of common basic usage.
# Import the required modules.
from metabograph.config import Config
from metabograph.biopax.query_manager import BiopaxQueryManager
from metabograph.biopax.graph_generator import BiopaxGraphGenerator
# Instantiate a configuration file from a file (or manually).
config = Config(path="config.yaml")
# Instantiate a query manager.
bqm = BiopaxQueryManager(config)
# Get the lists of species, pathways and cellular locations.
species = bqm.list_species()
pathways = bqm.list_pathways()
locations = bqm.list_locations()
# Instantiate a graph generator and get the networkx graph object for the
# current configuration.
bgg = BiopaxGraphGenerator(bqm=bqm)
graph = bgg.get_graph()
# Do stuff with the graph...
print(graph)
# ...
Fuseki Server
Apache Jena Fuseki is a third-party server that provides faster parsing of OWL files than the Python packages owlready2
and rdflib
. The script download_fuseki.sh is provided to quickly download the source code and create a configuration file for running the server locally.
By default, the script will download files to the directory tmp/fuseki
and write required environment variables to tmp/fuseki/env.sh
. These environment variables can be can be loaded manually in a Bash shell via the command source tmp/fuseki/env.sh
.
The env.sh file mentioned above will run download_fuseki.sh
if necessary and set the Fuseki environment variables.
Once the environment variables have been set, the Fuseki server can be run in a terminal via the command metabograph-fuseki
.
The easiest way to run the local Fuseki server regardless of the user's current shell is to run the following command:
./scripts/run_in_venv.sh metabograph-fuseki <path_to_config>
Replace <path_to_config>
with the path to the Metabograph configuration file that will be used with Metabograph. See the metabograph-fuseki
help message for further options:
usage: metabograph-fuseki [-h] [-v] config
Run the Fuseki server.
positional arguments:
config The Metabograph configuration file.
options:
-h, --help show this help message and exit
-v, --verbose Show debug messages.
Examples
Examples can be found in the examples directory.
NVIDIA cuGraph
NVIDIA cuGraph is a drop-in alternative backend for NetworkX that claims significant speedups without any code changes. To use it, install the appropriate nx-cugraph package in the same environment as Metabograph (e.g. via pip
) and then export the environment variableNX_CUGRAPH_AUTOCONFIG=True
before running your code:
export NX_CUGRAPH_AUTOCONFIG=True
./your_code.py
# OR
NX_CUGRAPH_AUTOCONFIG=True ./your_code.py
Utilities
The project provides the following files and scripts for convenience. See the dependencies below.
-
env.sh - A Bash file that can be sourced to do the following:
- Create and configure a Python virtual environment.
- Download and unpack the Fuseki server.
- Activate the Python virtual environment and configure the Fuseki environment variables.
-
build_doc.sh - Build the Python documentation with Sphinx.
-
create_venv.sh - Create a Python virtual environment with Metabograph installed in editable mode.
-
download_fuseki.sh - Download the Fuseki server and unpack it to a temporary directory. This will also generate a file with the required environment variables for using the server.
-
pylint.sh - Run Pylint on the Metabograph source code.
-
run_in_venv.sh - Run a command in the fully configured virtual environment. This should by used by users of non-Bash shells who do not wish to manually configure their own environments. The script will run any command passed to it:
run_in_venv.sh command arg1 arg2 ...
.
Dependencies
- Bash version 5.0 or newer.
- bsdtar from libarchive (available in the
libarchive-tools
package on Ubuntu).
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file metabograph-2024.4.tar.gz
.
File metadata
- Download URL: metabograph-2024.4.tar.gz
- Upload date:
- Size: 290.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad22fc5767ac36b038eadc32bfd45688711a6a06fe9374b9b3fdd298cf40c436 |
|
MD5 | 99862783d071ec5dce4e4176a2a40a8d |
|
BLAKE2b-256 | 1160708c2bcd4eaee46d4df7c551917094d6835014172145b328df477b0e8aca |
File details
Details for the file metabograph-2024.4-py3-none-any.whl
.
File metadata
- Download URL: metabograph-2024.4-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32b833aeb3ecaeca8c12d29ea0f2ccf84f9d5f3292a7608aaa21fe95c8c9229a |
|
MD5 | 9689022b670fd16e6da80409dbfdaeb7 |
|
BLAKE2b-256 | 3f3677d3d79a2b1e97c640a14af3d687a5c9bd1cf1f45c799711960484480054 |