Tools for building Escher-compatible metabolic maps from KEGG and model annotations
Project description
BioEMMA
BioEMMA is an early-stage Python library for building Escher-compatible metabolic maps from KEGG pathway layouts and genome-scale metabolic models.
The current main workflow is:
- Parse a KEGG KGML/XML pathway with
KeggMap. - Convert KEGG compounds and reactions to BiGG/SEED identifiers using bundled MetaNetX-derived mapping tables.
- Build an Escher JSON map with
EscherMapper. - Optionally save a reproducible workflow output directory with the Escher map, the reconstructed KEGG map, flux data, summaries, and merged maps.
The project is currently in alpha. The public API may still change while the package structure is being prepared for PyPI.
Installation
For local development:
pip install -e .
Runtime dependencies can also be installed from:
pip install -r requirements.txt
Basic Usage
The workflow API is the recommended user-facing entry point. It accepts a COBRA
model path (or an in-memory cobra.Model) and either a KEGG pathway identifier
or a local KGML file.
from bioemma.workflow import build_outputs
result = build_outputs(
model="path/to/model.xml",
pathway="rn00010",
output_dir="out",
database="BIGG",
run_fba=True,
)
escher_map = result.escher_map
kegg_reconstruction = result.kegg_reconstruction
escher_map is a Python object compatible with the Escher JSON map structure,
and kegg_reconstruction is a normalized analytical representation of the KEGG
layout and mapped identifiers. When save_kegg_map=True, BioEMMA also writes
kegg_escher_map.json: a pure KEGG-layout Escher map before model filtering or
secondary metabolite addition.
With output_dir, BioEMMA writes:
out/rn00010/
escher_map.json
kegg_escher_map.json # when save_kegg_map=True or --save-kegg-map
kegg_source_reconstruction.json
summary.json
fluxes.json # when fluxes are provided or run_fba=True
escher_map.html # when save_html=True
escher_map_with_fluxes.html # when flux data and HTML output are requested
HTML output requires the escher package. BioEMMA does not export PNG files
directly; open the HTML output in Escher and use Escher's built-in PNG export
when a raster image is needed.
Visualization layout settings can be tuned with VisualizationOptions:
from bioemma.workflow import build_outputs
from bioemma.visualization import VisualizationOptions
result = build_outputs(
model="path/to/model.xml",
pathway="rn00010",
output_dir="out",
visualization_options=VisualizationOptions(
scaling_factor=4,
axis_epsilon=2,
markers_dist=10,
metabolite_label_shift=(10, 10),
reaction_label_shift=(10, 10),
canvas_margin_x=160,
canvas_margin_y=160,
axis_offset=20,
),
)
The defaults are conservative starting values for KEGG layouts: coordinates are scaled up for Escher readability, aligned reaction lanes keep a small tolerance, and secondary metabolites get enough spacing after scaling.
Command Line Usage
Build one map from a KEGG pathway identifier:
bioemma build --model path/to/model.xml --pathway rn00010 --output-dir out
Build one map from a local KGML file:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml --output-dir out
Build multiple maps and merge them:
bioemma build --model path/to/model.xml --pathway rn00010 rn00020 --output-dir out
The same works with local KGML files:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml path/to/rn00020.xml --output-dir out
For multiple inputs, BioEMMA writes each individual map into its own subfolder and writes a merged Escher map at:
out/merged_escher_map.json
Use --no-merge to skip the merged map.
The legacy single-file JSON output is still available:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml --output escher_map.json
summary.json includes map_stats, a stage-by-stage count of total elements,
nodes, reactions, and segments added or removed while the map is built. To print
the same reduction statistics in the CLI, add --map-stats:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml --output-dir out --map-stats
To save the unfiltered KEGG Escher map next to the normal model-derived map,
add --save-kegg-map:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml --output-dir out --save-kegg-map
The same visualization settings are available in the CLI, for example:
bioemma build --model path/to/model.xml --kgml path/to/rn00010.xml --output-dir out --scaling-factor 4 --canvas-margin-x 160 --canvas-margin-y 160
If cobrapy cannot access its default cache directory on Windows, set a local cache directory before running tests or CLI commands:
set BIOEMMA_COBRA_CACHE_DIR=%CD%\.cobra-cache
Included Mapping Data
BioEMMA currently bundles two compact runtime mapping files:
metabolite_mapping.tsvreaction_mapping.tsv
These files are derived from MetaNetX cross-reference tables and are used to map KEGG identifiers to BiGG and SEED identifiers. The large raw MetaNetX download cache is not intended to be included in the Python package.
See NOTICE.md for third-party data attribution and usage notes.
License
BioEMMA's source code is distributed under the MIT License. Bundled mapping
data are derived from third-party database resources and may be subject to
their own license terms. See LICENSE and NOTICE.md.
By default, the workflow keeps the KEGG reactions and compounds that can be
matched to the COBRA model. To preserve KEGG-only elements that are not present
in the model, pass include_kegg_only=True in Python or use
--include-kegg-only in the CLI.
Development Notes
The package code lives in:
src/bioemma/
The current core modules are:
bioemma.maps.KeggMapbioemma.mapper_base.EscherMapperbioemma.metanetx_mapper.MetaNetXMapperbioemma.merger.EscherMergerbioemma.workflow.build_outputsbioemma.workflow.build_many_outputs
The script for regenerating mapping tables is kept separately in:
scripts/prepare_db_mapping.py
Run the test suite from a source checkout with:
set PYTHONPATH=%CD%\src
set BIOEMMA_COBRA_CACHE_DIR=%CD%\.pytest-cobra-cache
python -m pytest -q
Publishing
Before publishing, bump the version in pyproject.toml, run tests, and build
fresh distribution artifacts:
python -m pip install --upgrade build twine
rmdir /s /q dist
python -m build
python -m twine check dist/*
Upload to TestPyPI first:
python -m twine upload --repository testpypi dist/*
Install from TestPyPI in a clean environment and smoke-test the CLI. Then upload the same checked artifacts to PyPI:
python -m twine upload dist/*
Status
BioEMMA is not yet a stable release. Before publishing to PyPI, the package still needs a final check of bundled data, license compatibility, and user-facing visualization dependencies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioemma-0.2.0.tar.gz.
File metadata
- Download URL: bioemma-0.2.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94293bebdf962e52614f2daee44bd31943824e472f376b131d2fcb40ba939a9f
|
|
| MD5 |
cde41781daf29cd41466f0a9f2a02076
|
|
| BLAKE2b-256 |
5c22abc81d6814746e1793ec00a83f35723021fd13981a4c0efca77adc1972fd
|
File details
Details for the file bioemma-0.2.0-py3-none-any.whl.
File metadata
- Download URL: bioemma-0.2.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cb4c9cf29e8440cf5e723f718d231a5c52ebcf4bb23bd27eeebe7f06fca5558
|
|
| MD5 |
ab0fdc7c601499d0b40f7addb0c32eb6
|
|
| BLAKE2b-256 |
a3b377ba767135f817ed05877ad4452e24cb44ceca116c27478716fb25153678
|