Skip to main content

ORION ingests data from knowledge bases and converts it into interoperable Biolink Model knowledge graphs.

Project description

ORION

Operational Routine for the Ingest and Output of Networks

ORION ingests data from knowledge sources and converts them into Biolink Model knowledge graphs in KGX format.

Each data source goes through the following pipeline:

  1. Fetch - retrieve the original data source
  2. Parse - transform the data into KGX files
  3. Normalize - use normalization services to convert identifiers and ontology terms to preferred synonyms
  4. Supplement - add supplementary knowledge specific to that source

Sources are defined in a Graph Spec yaml file (see examples in the graph_specs/ directory). ORION automatically runs each specified source through the pipeline and merges them into a Knowledge Graph.

Installation

ORION requires uv for dependency management.

git clone https://github.com/RobokopU24/ORION.git
cd ORION
uv sync --extra robokop

The core library is also available on PyPI (pip install robokop-orion), but the full repository is needed to utilize ingest modules from the ROBOKOP project.

CLI Commands

After installation, the following commands are available (prefix with uv run if not using a uv-managed shell):

Command Description
orion-build Build complete knowledge graphs from a Graph Spec
orion-ingest Run the ingest pipeline for individual data sources
orion-merge Merge KGX node/edge files
orion-meta-kg Generate MetaKG and test data files
orion-redundant-kg Generate edge files with redundant biolink predicates
orion-ac Generate AnswerCoalesce files
orion-neo4j-dump Generate Neo4j database dumps
orion-memgraph-dump Generate Memgraph database dumps

Configuring ORION

ORION is configured via environment variables, which can be set directly or through an .env file.

In most cases, you can simply use this provided script to set up a local environment. It will create directories for ORION outputs next to where ORION was installed and set env vars pointing to them.

source ./set_up_dev_env.sh

For more customization and settings, use an .env file. Copy or rename the .env.example file to .env.

Then uncommment and edit .env as desired to set values for your environment.

Variable Purpose Default
ORION_STORAGE Path to a directory for data ingest pipeline storage (required)
ORION_GRAPHS Path to a directory for Knowledge Graph outputs (required)
ORION_LOGS Path to a Log file directory (if unset, logs go to stdout) None
ORION_GRAPH_SPEC Graph Spec filename from graph_specs/ example-graph-spec.yaml
ORION_GRAPH_SPEC_URL URL to a remote Graph Spec file

Configuration is managed by pydantic-settings — environment variables override .env file values, and sensible defaults are provided where possible. See orion/config.py for the full list of settings.

Graph Spec

A Graph Spec yaml file defines which sources to include in a knowledge graph. Set one of the following (not both):

  • ORION_GRAPH_SPEC - name of a file in the graph_specs/ directory
  • ORION_GRAPH_SPEC_URL - URL pointing to a Graph Spec yaml file

Here is a simple Graph Spec example:

graphs:
  - graph_id: Example_Graph
    graph_name: Example Graph
    graph_description: A free text description of what is in the graph.
    output_format: neo4j
    sources:
      - source_id: DrugCentral
      - source_id: HGNC

See the full list of data sources and their identifiers in the data sources file.

Graph Spec Parameters

The following parameters can be set per data source:

  • merge_strategy - alternative merge strategies
  • strict_normalization - whether to discard nodes that fail to normalize (true/false)
  • conflation - whether to conflate genes with proteins and chemicals with drugs (true/false)

The following can be set at the graph level:

  • add_edge_id - whether to add unique identifiers to edges (true/false)
  • edge_id_type - if add_edge_id is true, the type of identifier can be specified (uuid or orion)

See the graph_specs/ directory for more examples.

Running with Docker

Make sure environment variables are set or an .env file is configured with at least ORION_STORAGE, and ORION_GRAPHS pointing to valid host directories. The compose file reads these env vars and mounts those directories as volumes in the container.

Build the image:

docker compose build

Build all graphs in the configured Graph Spec:

docker compose up

Build a specific graph:

docker compose run orion orion-build Example_Graph

Run the ingest pipeline for a single data source:

docker compose run orion orion-ingest DrugCentral

See available data sources and options:

docker compose run orion orion-ingest -h

Development

Install dev dependencies with uv:

uv sync --extra robokop --group dev

Run tests:

uv run pytest tests/

Contributing

Contributions are welcome, see the Contributor README.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robokop_orion-0.2.0.tar.gz (94.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robokop_orion-0.2.0-py3-none-any.whl (109.9 kB view details)

Uploaded Python 3

File details

Details for the file robokop_orion-0.2.0.tar.gz.

File metadata

  • Download URL: robokop_orion-0.2.0.tar.gz
  • Upload date:
  • Size: 94.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for robokop_orion-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1e1a5225fa9f30fd107446e27ffb12aa2d0e41b94ea4e3267786cdc2e8ae69f8
MD5 d8f6ffd292233595ab520c250789efe4
BLAKE2b-256 54938c73a982a004b986257dcb42dd37d081319d37e04d2846602698775dcdf8

See more details on using hashes here.

File details

Details for the file robokop_orion-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: robokop_orion-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 109.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for robokop_orion-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 537ff044fa7fd383b2e7d3aee38cf8f0e8141967179b95a0ee1fc2d516ea404b
MD5 3b7b55e66c2e96802941de22b918dac7
BLAKE2b-256 9ae29991091ef7bacf090f62f90d0f2ad5be6a97211adc71083f80327cd72642

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page