Pathfinder is designed to identify semantic paths between two biological entities.
Project description
catrax-pathfinder
catrax-pathfinder is a Python package for discovering and returning candidate paths between two CURIE nodes using a PloverDB endpoint and precomputed databases (NGD and node degree). It supports SQLite and MySQL backends for both the NGD and degree repositories via a simple URL prefix.
Installation
pip install catrax-pathfinder
Obtain databases
You will need a compatible curie_ngd_v1.0_KG and kg2c_v1.0_KG SQLite database for the KG version you are using.
- Recommended: Ask a team member for mysql urls to these databases
- Alternative: Ask a team member for local copies of these databases
Quickstart
from pathfinder.Pathfinder import Pathfinder
plover_url = "https://kg2cploverdb.ci.transltr.io"
ngd_url = "sqlite:curie_ngd_v1.0_KG2.10.2.sqlite"
degree_url = "sqlite:kg2c_v1.0_KG2.10.2.sqlite"
# Optional filters
blocked_curies = set([
# "CHEBI:1234",
])
blocked_synonyms = set([
# "aspirin",
])
# Any logger-like object is acceptable (e.g., a Python logging.Logger)
logger = None
pathfinder = Pathfinder(
repository_name="MLRepo",
plover_url=plover_url,
ngd_url=ngd_url,
degree_url=degree_url,
blocked_curies=blocked_curies,
blocked_synonyms=blocked_synonyms,
logger=logger,
)
result, aux_graphs, knowledge_graph = pathfinder.get_paths(
src_node_id="MONDO:0005148",
dst_node_id="CHEBI:15365",
src_pinned_node="node_1",
dst_pinned_node="node_2",
hops_numbers=4,
max_hops_to_explore=6,
limit=500,
prune_top_k=30,
degree_threshold=30000,
category_constraints=[],
)
API
Pathfinder(...)
Constructor:
Pathfinder(
repository_name: str,
plover_url: str,
ngd_url: str,
degree_url: str,
blocked_curies: Set[str],
blocked_synonyms: Set[str],
logger,
)
Parameters
- repository_name: For now, this should always be
"MLRepo". - plover_url: URL of the PloverDB endpoint (example:
https://kg2cploverdb.ci.transltr.io). - ngd_url: Connection string for the CURIE-NGD repository (SQLite or MySQL).
- degree_url: Connection string for the node degree repository (SQLite or MySQL).
- blocked_curies: A set of CURIE IDs; any path that passes through these CURIEs is dropped.
- blocked_synonyms: A set of strings; any path that passes through nodes whose names match these values is dropped.
- logger: A logger-like object used for logging.
get_paths(...)
get_paths(
src_node_id: str,
dst_node_id: str,
src_pinned_node: str,
dst_pinned_node: str,
hops_numbers: int = 4,
max_hops_to_explore: int = 6,
limit: int = 500,
prune_top_k: int = 30,
degree_threshold: int = 30000,
category_constraints: Set[str] = None
)
Parameters
- src_node_id: Source CURIE ID.
- dst_node_id: Destination CURIE ID.
- src_pinned_node: Source pinned node ID.
- dst_pinned_node: Destination pinned node ID.
- hops_numbers: Maximum number of hops a returned path can have.
- max_hops_to_explore: Maximum depth to explore during expansion; after exploration, paths longer than
hops_numbersare removed. - limit: Maximum number of paths to return.
- prune_top_k: During each expansion step, neighbors are ranked and only the top
kare kept for further expansion. - degree_threshold: Nodes with degree greater than this threshold are not expanded.
- category_constraints (optional): If non-empty, keeps only paths that contain at least one of these categories.
Returns
get_paths(...) returns a 3-tuple of TRAPI-compliant objects.
These correspond to standard Translator Reasoner API (TRAPI) result structures: For more details on TRAPI object formats and the overall API specification, see the TRAPI documentation on GitHub: https://github.com/NCATSTranslator/ReasonerAPI
(result, aux_graphs, knowledge_graph)
Repository URL formats (SQLite and MySQL)
Both ngd_url and degree_url accept a backend prefix.
SQLite
Use sqlite: followed by the SQLite filename/path.
- NGD example:
sqlite:curie_ngd_v1.0_KG2.10.2.sqlite
- Degree example:
sqlite:kg2c_v1.0_KG2.10.2.sqlite
MySQL
Use mysql: followed by your MySQL config string.
- NGD example:
mysql:arax-databases-mysql.rtx.ai:public_ro:curie_ngd_v1_0_kg2_10_2
- Degree example:
mysql:arax-databases-mysql.rtx.ai:public_ro:kg2c_v1_0_kg2_10_2
The package automatically detects which backend to use based on the
sqlite:/mysql:prefix.
Notes & tips
- Start with smaller
hops_numbersandlimitif you are experimenting, then scale up. - If exploration grows too quickly on high-degree nodes, consider lowering
degree_thresholdand/orprune_top_k. - Use
blocked_curiesandblocked_synonymsto remove known “noisy” nodes and keep path results cleaner.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catrax_pathfinder-1.2.0.tar.gz.
File metadata
- Download URL: catrax_pathfinder-1.2.0.tar.gz
- Upload date:
- Size: 23.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d46a8a00e2c0276fdaa4835c614891c6019884981d8b6da5ed5f511a6858c772
|
|
| MD5 |
c55cb35a1a87ea6b408f4089bbb31fee
|
|
| BLAKE2b-256 |
be52481bac7260b53e0b40eca1d868d1ccdef245482a913bdadef046661da495
|
File details
Details for the file catrax_pathfinder-1.2.0-py3-none-any.whl.
File metadata
- Download URL: catrax_pathfinder-1.2.0-py3-none-any.whl
- Upload date:
- Size: 23.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aff96c7cdf0593f15e8898409679142fdf6e0e14bd94969f17bb3684f474446
|
|
| MD5 |
9b32a763c7a78ad2a2ae2c09db35cb89
|
|
| BLAKE2b-256 |
31216ade4762862fa91e0e6bcb4788ff74b3fe2a02613e28a59fcd743b15b9dc
|