Skip to main content

Pathfinder is designed to identify semantic paths between two biological entities.

Project description

catrax-pathfinder

catrax-pathfinder is a Python package for discovering and returning candidate paths between two CURIE nodes using a PloverDB endpoint and precomputed databases (NGD and node degree). It supports SQLite and MySQL backends for both the NGD and degree repositories via a simple URL prefix.


Installation

pip install catrax-pathfinder

Obtain databases

You will need a compatible curie_ngd_v1.0_KG and kg2c_v1.0_KG SQLite database for the KG version you are using.

  • Recommended: Ask a team member for mysql urls to these databases
  • Alternative: Ask a team member for local copies of these databases

Quickstart

from pathfinder.Pathfinder import Pathfinder

plover_url = "https://kg2cploverdb.ci.transltr.io"

ngd_url = "sqlite:curie_ngd_v1.0_KG2.10.2.sqlite"
degree_url = "sqlite:kg2c_v1.0_KG2.10.2.sqlite"

# Optional filters
blocked_curies = set([
    # "CHEBI:1234",
])
blocked_synonyms = set([
    # "aspirin",
])

# Any logger-like object is acceptable (e.g., a Python logging.Logger)
logger = None

pathfinder = Pathfinder(
    repository_name="MLRepo",
    plover_url=plover_url,
    ngd_url=ngd_url,
    degree_url=degree_url,
    blocked_curies=blocked_curies,
    blocked_synonyms=blocked_synonyms,
    logger=logger,
)

result, aux_graphs, knowledge_graph = pathfinder.get_paths(
    src_node_id="MONDO:0005148",
    dst_node_id="CHEBI:15365",
    src_pinned_node="node_1",
    dst_pinned_node="node_2",
    hops_numbers=4,
    max_hops_to_explore=6,
    limit=500,
    prune_top_k=30,
    degree_threshold=30000,
    category_constraints=[],
)

API

Pathfinder(...)

Constructor:

Pathfinder(
    repository_name: str,
    plover_url: str,
    ngd_url: str,
    degree_url: str,
    blocked_curies: Set[str],
    blocked_synonyms: Set[str],
    logger,
)

Parameters

  • repository_name: For now, this should always be "MLRepo".
  • plover_url: URL of the PloverDB endpoint (example: https://kg2cploverdb.ci.transltr.io).
  • ngd_url: Connection string for the CURIE-NGD repository (SQLite or MySQL).
  • degree_url: Connection string for the node degree repository (SQLite or MySQL).
  • blocked_curies: A set of CURIE IDs; any path that passes through these CURIEs is dropped.
  • blocked_synonyms: A set of strings; any path that passes through nodes whose names match these values is dropped.
  • logger: A logger-like object used for logging.

get_paths(...)

get_paths(
    src_node_id: str,
    dst_node_id: str,
    src_pinned_node: str,
    dst_pinned_node: str,
    hops_numbers: int = 4,
    max_hops_to_explore: int = 6,
    limit: int = 500,
    prune_top_k: int = 30,
    degree_threshold: int = 30000,
    category_constraints: Set[str] = None
)

Parameters

  • src_node_id: Source CURIE ID.
  • dst_node_id: Destination CURIE ID.
  • src_pinned_node: Source pinned node ID.
  • dst_pinned_node: Destination pinned node ID.
  • hops_numbers: Maximum number of hops a returned path can have.
  • max_hops_to_explore: Maximum depth to explore during expansion; after exploration, paths longer than hops_numbers are removed.
  • limit: Maximum number of paths to return.
  • prune_top_k: During each expansion step, neighbors are ranked and only the top k are kept for further expansion.
  • degree_threshold: Nodes with degree greater than this threshold are not expanded.
  • category_constraints (optional): If non-empty, keeps only paths that contain at least one of these categories.

Returns

get_paths(...) returns a 3-tuple of TRAPI-compliant objects.

These correspond to standard Translator Reasoner API (TRAPI) result structures: For more details on TRAPI object formats and the overall API specification, see the TRAPI documentation on GitHub: https://github.com/NCATSTranslator/ReasonerAPI

(result, aux_graphs, knowledge_graph)

Repository URL formats (SQLite and MySQL)

Both ngd_url and degree_url accept a backend prefix.

SQLite

Use sqlite: followed by the SQLite filename/path.

  • NGD example:
    • sqlite:curie_ngd_v1.0_KG2.10.2.sqlite
  • Degree example:
    • sqlite:kg2c_v1.0_KG2.10.2.sqlite

MySQL

Use mysql: followed by your MySQL config string.

  • NGD example:
    • mysql:arax-databases-mysql.rtx.ai:public_ro:curie_ngd_v1_0_kg2_10_2
  • Degree example:
    • mysql:arax-databases-mysql.rtx.ai:public_ro:kg2c_v1_0_kg2_10_2

The package automatically detects which backend to use based on the sqlite: / mysql: prefix.


Notes & tips

  • Start with smaller hops_numbers and limit if you are experimenting, then scale up.
  • If exploration grows too quickly on high-degree nodes, consider lowering degree_threshold and/or prune_top_k.
  • Use blocked_curies and blocked_synonyms to remove known “noisy” nodes and keep path results cleaner.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catrax_pathfinder-1.2.0.tar.gz (23.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

catrax_pathfinder-1.2.0-py3-none-any.whl (23.7 MB view details)

Uploaded Python 3

File details

Details for the file catrax_pathfinder-1.2.0.tar.gz.

File metadata

  • Download URL: catrax_pathfinder-1.2.0.tar.gz
  • Upload date:
  • Size: 23.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for catrax_pathfinder-1.2.0.tar.gz
Algorithm Hash digest
SHA256 d46a8a00e2c0276fdaa4835c614891c6019884981d8b6da5ed5f511a6858c772
MD5 c55cb35a1a87ea6b408f4089bbb31fee
BLAKE2b-256 be52481bac7260b53e0b40eca1d868d1ccdef245482a913bdadef046661da495

See more details on using hashes here.

File details

Details for the file catrax_pathfinder-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for catrax_pathfinder-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3aff96c7cdf0593f15e8898409679142fdf6e0e14bd94969f17bb3684f474446
MD5 9b32a763c7a78ad2a2ae2c09db35cb89
BLAKE2b-256 31216ade4762862fa91e0e6bcb4788ff74b3fe2a02613e28a59fcd743b15b9dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page