Skip to main content

Unified biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more

Project description

🧬 Biomedical Knowledge Lookup

PyPI version Python 3.10+ License: MIT Tests Coverage Ruff Documentation PyPI downloads GitHub last commit DOI

A unified Python library for biological concept lookup across 29+ biomedical knowledge sources including BioPortal, OLS, UMLS, ChEMBL, DisGeNET, and more. Built for bioinformatics researchers, knowledge graph developers, and biomedical data scientists.

✨ Features

  • 🔍 29+ Knowledge Sources: Comprehensive coverage of biomedical ontologies and databases
  • ⚡ Unified API: Single interface for all sources with consistent results
  • 🔄 Multi-source Annotation: Cross-reference concepts across multiple databases
  • 📊 RDF Export: Convert results to RDF format for knowledge graphs
  • 💾 Intelligent Caching: Built-in caching system for performance optimization
  • 🔄 Async Support: Asynchronous operations for scalable applications
  • 🧪 Comprehensive Testing: Full test suite with unit and integration tests
  • 📚 Rich Documentation: Extensive examples and API documentation

🚀 Quick Start

Installation

pip install biomedical-knowledge-lookup
# or
poetry add biomedical-knowledge-lookup
# or from source
git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install

Basic Usage

from knowledge_lookup import CentralKnowledgeLookup, KnowledgeSource

# Initialize the lookup system
lookup = CentralKnowledgeLookup()

# Search for concepts across multiple sources
results = await lookup.search_concepts(
    "diabetes mellitus",
    sources=[KnowledgeSource.BIOPORTAL, KnowledgeSource.OLS, KnowledgeSource.UMLS]
)

# Get detailed information about a specific concept
concept_details = await lookup.get_concept_details("DOID:9351")

# Export results to RDF
rdf_graph = lookup.export_to_rdf(results)

Advanced Usage with Multi-source Annotation

from knowledge_lookup import MultiSourceAnnotator

# Annotate text with concepts from multiple sources
annotator = MultiSourceAnnotator()
annotations = await annotator.annotate_text(
    "Type 2 diabetes is associated with insulin resistance",
    confidence_threshold=0.7
)

# Get consensus annotations across sources
consensus = annotator.get_consensus_annotations(annotations)

📋 Supported Knowledge Sources

Source Description API Key Required
BioPortal NCBI BioPortal ontology repository Yes
OLS Ontology Lookup Service No
UMLS Unified Medical Language System Yes
ChEMBL Chemical database No
DisGeNET Disease-gene associations No
DrugBank Drug information database No
Ensembl Genome annotation database No
Gene Ontology Molecular function/process/component No
HPO Human Phenotype Ontology No
Mondo Mondo Disease Ontology No
OpenTargets Target-disease associations No
PubChem Chemical information No
Reactome Pathway database No
UniProt Protein sequence database No
WikiData Structured knowledge base No
ZOOMA Ontology mapping service No
And 13+ more... See full list in documentation Varies

🏗️ Architecture

knowledge_lookup/
├── adapters/           # Individual source adapters
├── models.py          # Data models and enums
├── central_lookup.py  # Main lookup coordinator
├── multi_source_annotator.py  # Cross-source annotation
├── rdf_converter.py   # RDF export utilities
├── cache.py          # Caching system
└── base.py           # Abstract base classes

📖 Documentation

Additional Resources

Example Notebooks

Explore interactive examples in the examples/ directory:

  • Basic concept lookup
  • Multi-source annotation
  • RDF export and knowledge graph construction
  • Performance benchmarking

🔧 Configuration

API Keys

Some sources require API keys. Set them as environment variables:

export BIOPORTAL_API_KEY="your_key_here"
export UMLS_API_KEY="your_key_here"
# ... etc

Or create a .env file:

BIOPORTAL_API_KEY=your_key_here
UMLS_API_KEY=your_key_here

Advanced Configuration

from knowledge_lookup import LookupConfig

config = LookupConfig(
    rate_limits={
        KnowledgeSource.BIOPORTAL: 10,  # requests per second
        KnowledgeSource.OLS: 20,
    },
    cache_enabled=True,
    cache_dir="./cache"
)

lookup = CentralKnowledgeLookup(config)

🧪 Testing

# Run all tests
poetry run pytest

# Run specific test categories
poetry run pytest -m "unit"        # Unit tests only
poetry run pytest -m "integration" # Integration tests
poetry run pytest -m "not slow"    # Skip slow tests

# Run with coverage
poetry run pytest --cov=knowledge_lookup

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Adding New Adapters

  1. Extend KnowledgeSourceAdapter in base.py
  2. Implement required methods: search_concepts(), get_concept_details()
  3. Add to adapters/__init__.py
  4. Add tests in tests/unit/test_adapters/
  5. Update documentation

Development Setup

git clone https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup.git
cd biomedical-knowledge-lookup
poetry install
poetry run pre-commit install

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built upon the AID-PAIS Knowledge Graph project
  • Thanks to all contributors and the biomedical research community
  • Special thanks to the maintainers of the various knowledge sources

📞 Support

🔬 Citation

If you use this library in your research, please cite:

@software{heinicke_biomedical_knowledge_lookup_2025,
  author = {Heinicke, Jonas},
  title = {Biomedical Knowledge Lookup: Unified biological concept lookup across 29+ biomedical knowledge sources},
  url = {https://github.com/JonasHeinickeBio/biomedical-knowledge-lookup},
  version = {1.0.0},
  year = {2025}
}

GitHub stars GitHub forks

⭐ Star this repository if you find it useful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biomedical_knowledge_lookup-1.0.0.tar.gz (159.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biomedical_knowledge_lookup-1.0.0-py3-none-any.whl (231.4 kB view details)

Uploaded Python 3

File details

Details for the file biomedical_knowledge_lookup-1.0.0.tar.gz.

File metadata

File hashes

Hashes for biomedical_knowledge_lookup-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6698480da3511e5a27296e91214afd9c87807604e92171c0c901f1e0296fbaee
MD5 03d2cf6fc766b615137026ceeb2eb770
BLAKE2b-256 0d6373d25ad09e13a94ee0fa3d56e5a92cf3adb5bdb0c862539a9a5a2b81f4be

See more details on using hashes here.

Provenance

The following attestation bundles were made for biomedical_knowledge_lookup-1.0.0.tar.gz:

Publisher: publish.yml on JonasHeinickeBio/biomedical-knowledge-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biomedical_knowledge_lookup-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for biomedical_knowledge_lookup-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d49328d9ea1aa29bd390e545c20a45b53923465bdf4a20abd6da89943de8919
MD5 9a4b39939f314e75f52cbfb9f7d87a72
BLAKE2b-256 e80be3fae0ea03032588dcd8c89161f8f146725edf4e66ab58e320530b59a6a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for biomedical_knowledge_lookup-1.0.0-py3-none-any.whl:

Publisher: publish.yml on JonasHeinickeBio/biomedical-knowledge-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page