A retrieval-augmented biomedical literature framework for evidence discovery, citation mapping, and downstream omics analysis

Project description

RAG-Powered Biomedical Evidence Framework (ragbio)

A reusable retrieval-augmented generation (RAG) toolkit for biomedical knowledge discovery built on PubMed literature, vector search, and Ollama-based LLMs (DeepSeek / LLaMA3).

ragbio enables study-aware ingestion, embedding, and querying of biomedical literature to support gene–disease–therapy exploration, summarization, and network visualization.

Now published as a pip-installable Python package and designed for integration into research pipelines and bioinformatics workflows.

Overview

The RAG-powered assistant enables:

Semantic search over PubMed abstracts
Study-scoped literature ingestion for reproducibility
Summarization of complex biomedical evidence using LLMs
Citation-aware responses grounded in PubMed IDs
Modular ingestion → embedding → retrieval pipeline
Optional gene–disease–drug network visualization

Example questions

Which genes are linked to oxidative stress in Alzheimer’s disease?
What therapies target amyloid pathways according to recent literature?
Summarize evidence connecting TP53 variants to cancer therapies.

Architecture

User Question
│
▼
FAISS Vector Retrieval (PubMed Abstracts)
│
▼
Top-K Relevant Abstracts
│
▼
Ollama LLM (DeepSeek / LLaMA3)
│
▼
Grounded Biomedical Summary + PMIDs
│
▼
(Optional) Gene–Disease–Drug Network Visualization

Installation

Install from PyPI (recommended)

pip install ragbio

Development install (from source)

git clone https://github.com/<your-username>/rag-gene-discovery-assistant.git
cd rag-gene-discovery-assistant
pip install -e .

Usage

1. Ingest PubMed Literature (study-aware)

python -m ragbio.utils.rag_data_loader \
  --study Alzheimer_CaseStudy \
  --search "Alzheimer Disease AND therapy" \
  --retmax 500 \
  --retstart 0

This creates the following structure (default: data/PubMed/):

PubMed/
├── Abstracts/Alzheimer_CaseStudy/
├── Metadata/Alzheimer_CaseStudy/
├── PDFs/Alzheimer_CaseStudy/
└── Index/Alzheimer_CaseStudy/

2. Generate Embeddings & Build FAISS Index

python -m ragbio.embeddings.embedding_engine \
  --study Alzheimer_CaseStudy

Reads from Abstracts/<study>/
Writes FAISS index to Index/<study>/

3. Run RAG Queries

python -m ragbio.pipeline.rag_pipeline \
  --query "Which therapies target amyloid pathways in Alzheimer’s disease?" \
  --top_k 10 \
  --structured \
  --study Alzheimer_CaseStudy

Outputs are generated per study for clean provenance and reproducibility.

4. Visualize Gene–Disease–Drug Networks (optional)

Launch the Streamlit app:

streamlit run ragbio/pipeline/rag_cytoscape_streamlit.py --study Alzheimer_CaseStudy

This reads structured outputs and visualizes gene–disease–drug relationships as an interactive network.

RAG Network Graph

Example: Gene–disease–drug co-occurrence network derived from PubMed abstracts.

5. Optional: Notebook Exploration

Explore example workflows in:

notebooks/RAG_GeneDiscovery_Assistant.ipynb

Technologies Used

Category	Tools
Embeddings	Ollama embedding models (configurable)
LLMs	DeepSeek, LLaMA3 (via Ollama)
Retrieval	FAISS
Data Sources	PubMed (NCBI Entrez)
Visualization	Streamlit, Cytoscape
Language	Python 3.10+

Design Principles

Study-first organization for reproducibility
Separation of concerns (ingestion ≠ embedding ≠ retrieval)
Grounded answers with PubMed citations
Composable modules usable outside the CLI
Safe defaults with override via CLI or environment variables

Future Enhancements

Neo4j-backed gene–disease–drug knowledge graphs
Comparative evaluation of DeepSeek vs BioGPT outputs
Variant-level evidence integration
API support for FastAPI / Django
Automated citation grounding and confidence scoring
Multi-study dashboards and comparisons

Project details

Release history Release notifications | RSS feed

2.0.3

Apr 18, 2026

2.0.2

Apr 11, 2026

2.0.1

Jan 23, 2026

This version

0.2.0

Jan 19, 2026

0.1.19

Apr 11, 2026

0.1.18

Jan 23, 2026

0.1.16

Dec 24, 2025

0.1.15

Dec 22, 2025

0.1.14

Dec 22, 2025

0.1.13

Dec 11, 2025

0.1.12

Dec 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragbio-0.2.0.tar.gz (170.9 kB view details)

Uploaded Jan 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragbio-0.2.0-py3-none-any.whl (164.6 kB view details)

Uploaded Jan 19, 2026 Python 3

File details

Details for the file ragbio-0.2.0.tar.gz.

File metadata

Download URL: ragbio-0.2.0.tar.gz
Upload date: Jan 19, 2026
Size: 170.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ragbio-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`81119a888e14a6cac3a49b49debd05fe892dc474c534839ab3fa76fd03c0c765`
MD5	`e846f88d21e9bc0b5676a7367f700eb9`
BLAKE2b-256	`fe46ff2fc5045df43573db716965a6da68f2b57ac8580a48fb653be837f562e6`

See more details on using hashes here.

File details

Details for the file ragbio-0.2.0-py3-none-any.whl.

File metadata

Download URL: ragbio-0.2.0-py3-none-any.whl
Upload date: Jan 19, 2026
Size: 164.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ragbio-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`254d3991dcdef3e8193adf6989166fbf9d37cca3c672faa7f72279d6549875cc`
MD5	`3050b562ebf7508a9d3e68a3839eff63`
BLAKE2b-256	`49ddc6f5be435b421e56629a776c577c254e7df3ad6611834a79c6316fcee623`

See more details on using hashes here.

ragbio 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

RAG-Powered Biomedical Evidence Framework (ragbio)

Overview

Architecture

Installation

Install from PyPI (recommended)

Development install (from source)

Usage

1. Ingest PubMed Literature (study-aware)

2. Generate Embeddings & Build FAISS Index

3. Run RAG Queries

4. Visualize Gene–Disease–Drug Networks (optional)

5. Optional: Notebook Exploration

Technologies Used

Design Principles

Future Enhancements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes