Skip to main content

Fast path finding in large knowledge graphs

Project description

GANDALF

Graph Analysis Navigator for Discovery And Link Finding

Features

  • Compressed Sparse Row (CSR) graph representation for memory efficiency
  • Bidirectional search for optimal performance
  • O(1) property lookups via hash indexing
  • Predicate filtering to reduce path explosion
  • Batch property enrichment for fast results
  • Diagnostic tools to understand path counts

Installation

Recommended: Use a virtual environment

Some transitive dependencies (e.g., stringcase, pytest-logging) require modern pip/setuptools to build correctly. Using a virtual environment ensures you have updated tools.

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Upgrade pip and setuptools (important for building dependencies)
pip install --upgrade pip setuptools wheel

# Install the package
pip install -e .

Alternative: Direct install (may fail on some systems)

If you have a recent pip/setuptools already, you can try:

pip install -e .

Quick Start

Unzipping a full translator kgx

  • tar -xvf translator_kg.tar.zst This will output a nodes.jsonl and edges.jsonl file

Build a graph from JSONL

from gandalf import build_graph_from_jsonl

# Build with ontology filtering
graph = build_graph_from_jsonl(
    edges_path="data/raw/edges.jsonl",
    nodes_path="data/raw/nodes.jsonl",
    excluded_predicates={'biolink:subclass_of'}
)

# Save for fast loading
graph.save_mmap("data/processed/graph_filtered")

Query paths

from gandalf import CSRGraph, find_paths

# Load graph (takes ~1-2 seconds)
graph = CSRGraph.load_mmap("data/processed/graph")

# Find paths
paths = find_paths(
    graph,
    start_id="CHEBI:45783",
    end_id="MONDO:0004979"
)

print(f"Found {len(paths)} paths")

Filter by predicates

from gandalf import find_paths_filtered

# Only mechanistic relationships
paths = find_paths_filtered(
    graph,
    start_id="CHEBI:45783",
    end_id="MONDO:0004979",
    allowed_predicates={
        'biolink:treats',
        'biolink:affects',
        'biolink:has_metabolite'
    }
)

Architecture

The package uses a three-stage pipeline:

  1. Topology Search (fast) - Find all paths using indices only
  2. Filtering (medium) - Apply business logic on necessary node or edge properties
  3. Enrichment (batch) - Load all properties for final paths only

This separation allows filtering millions of paths before expensive property lookups.

Configuration

The server is configured via environment variables:

Variable Default Description
GANDALF_GRAPH_PATH ../12_17_2025/gandalf_mmap Path to the mmap graph directory
GANDALF_GRAPH_FORMAT auto Graph format (auto or mmap)
GANDALF_LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)
GANDALF_LOG_FORMAT text Log format (text for human-readable, json for structured)
GANDALF_CORS_ORIGINS * Comma-separated list of allowed CORS origins
GANDALF_MAX_REQUEST_SIZE_MB 10 Maximum request body size in MB
GANDALF_RATE_LIMIT 100 Maximum requests per minute per client IP (0 to disable)

Docker

# Build the image
docker build -t gandalf .

# Run with a graph volume
docker run -p 6429:6429 \
  -v /path/to/graph:/data/graph \
  -e GANDALF_GRAPH_PATH=/data/graph \
  gandalf

Health Check

curl http://localhost:6429/health
# {"status": "ok", "graph_loaded": true, "node_count": 38456123, "edge_count": 127843456}

Releases

Run this on the mmap folder:

  • tar -czvf gandalf_mmap_<date>.tar.gz gandalf_mmap

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gandalf_csr-0.3.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gandalf_csr-0.3.0-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file gandalf_csr-0.3.0.tar.gz.

File metadata

  • Download URL: gandalf_csr-0.3.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gandalf_csr-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fbf7bc8ef53a4456f9817516c6c438bfa6f69e835fb4822aabbf0b92a11e2756
MD5 dd207238a7ec0bd770bd6ab0d51a0ac3
BLAKE2b-256 6a43e196a06e96f4de1a8932a983ce3417a4ba4f18edc0fbe105dc56dc8c15d4

See more details on using hashes here.

File details

Details for the file gandalf_csr-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gandalf_csr-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gandalf_csr-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 430920ffcb78cd8ef09408eaa31da101ee87d472a29c903978083e60d5505b3e
MD5 a14354b900c375e096fd27bcbdd066c7
BLAKE2b-256 c2671a7f18199513296e52b7024a76ede21cdaf3d6696cccba82f0babc9a32aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page