Unified biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more
Project description
🧬 Biomedical Knowledge Lookup
A unified Python library for biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more. Built for bioinformatics researchers, knowledge graph developers, and biomedical data scientists.
✨ Features
- 🔍 29+ Knowledge Sources: Comprehensive coverage of biomedical ontologies and databases
- ⚡ Unified API: Single interface for all sources with consistent results
- 🔄 Multi-source Annotation: Cross-reference concepts across multiple databases
- 📊 RDF Export: Convert results to RDF format for knowledge graphs
- 💾 Intelligent Caching: Built-in caching system for performance optimization
- 🔄 Async Support: Asynchronous operations for scalable applications
- 🧪 Comprehensive Testing: Full test suite with unit and integration tests
- 📚 Rich Documentation: Extensive examples and API documentation
🚀 Quick Start
Installation
pip install biomedical-knowledge-lookup
# or
poetry add biomedical-knowledge-lookup
# or from source
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install
Basic Usage
from knowledge_lookup import CentralKnowledgeLookup, KnowledgeSource
# Initialize the lookup system
lookup = CentralKnowledgeLookup()
# Search for concepts across multiple sources
results = await lookup.search_concepts(
"diabetes mellitus",
sources=[KnowledgeSource.BIOPORTAL, KnowledgeSource.OLS, KnowledgeSource.UMLS]
)
# Get detailed information about a specific concept
concept_details = await lookup.get_concept_details("DOID:9351")
# Export results to RDF
rdf_graph = lookup.export_to_rdf(results)
Advanced Usage with Multi-source Annotation
from knowledge_lookup import MultiSourceAnnotator
# Annotate text with concepts from multiple sources
annotator = MultiSourceAnnotator()
annotations = await annotator.annotate_text(
"Type 2 diabetes is associated with insulin resistance",
confidence_threshold=0.7
)
# Get consensus annotations across sources
consensus = annotator.get_consensus_annotations(annotations)
📋 Supported Knowledge Sources
| Source | Description | API Key Required |
|---|---|---|
| BioPortal | NCBI BioPortal ontology repository | Yes |
| OLS | Ontology Lookup Service | No |
| UMLS | Unified Medical Language System | Yes |
| ChEMBL | Chemical database | No |
| DisGeNET | Disease-gene associations | No |
| DrugBank | Drug information database | No |
| Ensembl | Genome annotation database | No |
| Gene Ontology | Molecular function/process/component | No |
| HPO | Human Phenotype Ontology | No |
| Mondo | Mondo Disease Ontology | No |
| OpenTargets | Target-disease associations | No |
| PubChem | Chemical information | No |
| Reactome | Pathway database | No |
| UniProt | Protein sequence database | No |
| WikiData | Structured knowledge base | No |
| ZOOMA | Ontology mapping service | No |
| And 13+ more... | See full list in documentation | Varies |
🏗️ Architecture
knowledge_lookup/
├── adapters/ # Individual source adapters
├── models.py # Data models and enums
├── central_lookup.py # Main lookup coordinator
├── multi_source_annotator.py # Cross-source annotation
├── rdf_converter.py # RDF export utilities
├── cache.py # Caching system
└── base.py # Abstract base classes
📖 Documentation
Additional Resources
Example Notebooks
Explore interactive examples in the examples/ directory:
- Basic concept lookup
- Multi-source annotation
- RDF export and knowledge graph construction
- Performance benchmarking
🔧 Configuration
API Keys
Some sources require API keys. Set them as environment variables:
export BIOPORTAL_API_KEY="your_key_here"
export UMLS_API_KEY="your_key_here"
# ... etc
Or create a .env file:
BIOPORTAL_API_KEY=your_key_here
UMLS_API_KEY=your_key_here
Advanced Configuration
from knowledge_lookup import LookupConfig
config = LookupConfig(
rate_limits={
KnowledgeSource.BIOPORTAL: 10, # requests per second
KnowledgeSource.OLS: 20,
},
cache_enabled=True,
cache_dir="./cache"
)
lookup = CentralKnowledgeLookup(config)
🧪 Testing
# Run all tests
poetry run pytest
# Run specific test categories
poetry run pytest -m "unit" # Unit tests only
poetry run pytest -m "integration" # Integration tests
poetry run pytest -m "not slow" # Skip slow tests
# Run with coverage
poetry run pytest --cov=knowledge_lookup
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Adding New Adapters
- Extend
KnowledgeSourceAdapterinbase.py - Implement required methods:
search_concepts(),get_concept_details() - Add to
adapters/__init__.py - Add tests in
tests/unit/test_adapters/ - Update documentation
Development Setup
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install
poetry run pre-commit install
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built upon the AID-PAIS Knowledge Graph project
- Thanks to all contributors and the biomedical research community
- Special thanks to the maintainers of the various knowledge sources
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: jonas.heinicke@helmholtz-hzi.de
🔬 Citation
If you use this library in your research, please cite:
@software{heinicke_biomedical_knowledge_lookup_2025,
author = {Heinicke, Jonas},
title = {Biomedical Knowledge Lookup: Unified biological concept lookup across 29+ biomedical knowledge sources},
url = {https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup},
version = {1.0.0},
year = {2025}
}
⭐ Star this repository if you find it useful!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biomedical_knowledge_lookup-1.0.0.tar.gz.
File metadata
- Download URL: biomedical_knowledge_lookup-1.0.0.tar.gz
- Upload date:
- Size: 159.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6698480da3511e5a27296e91214afd9c87807604e92171c0c901f1e0296fbaee
|
|
| MD5 |
03d2cf6fc766b615137026ceeb2eb770
|
|
| BLAKE2b-256 |
0d6373d25ad09e13a94ee0fa3d56e5a92cf3adb5bdb0c862539a9a5a2b81f4be
|
Provenance
The following attestation bundles were made for biomedical_knowledge_lookup-1.0.0.tar.gz:
Publisher:
publish.yml on JonasHeinickeBio/biomedical-knowledge-lookup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biomedical_knowledge_lookup-1.0.0.tar.gz -
Subject digest:
6698480da3511e5a27296e91214afd9c87807604e92171c0c901f1e0296fbaee - Sigstore transparency entry: 1805610080
- Sigstore integration time:
-
Permalink:
JonasHeinickeBio/biomedical-knowledge-lookup@111263be541a8d3b724b1961fa8472bb5dcd3e9d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JonasHeinickeBio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@111263be541a8d3b724b1961fa8472bb5dcd3e9d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file biomedical_knowledge_lookup-1.0.0-py3-none-any.whl.
File metadata
- Download URL: biomedical_knowledge_lookup-1.0.0-py3-none-any.whl
- Upload date:
- Size: 231.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d49328d9ea1aa29bd390e545c20a45b53923465bdf4a20abd6da89943de8919
|
|
| MD5 |
9a4b39939f314e75f52cbfb9f7d87a72
|
|
| BLAKE2b-256 |
e80be3fae0ea03032588dcd8c89161f8f146725edf4e66ab58e320530b59a6a4
|
Provenance
The following attestation bundles were made for biomedical_knowledge_lookup-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on JonasHeinickeBio/biomedical-knowledge-lookup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biomedical_knowledge_lookup-1.0.0-py3-none-any.whl -
Subject digest:
6d49328d9ea1aa29bd390e545c20a45b53923465bdf4a20abd6da89943de8919 - Sigstore transparency entry: 1805610110
- Sigstore integration time:
-
Permalink:
JonasHeinickeBio/biomedical-knowledge-lookup@111263be541a8d3b724b1961fa8472bb5dcd3e9d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JonasHeinickeBio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@111263be541a8d3b724b1961fa8472bb5dcd3e9d -
Trigger Event:
workflow_dispatch
-
Statement type: