Multimodal RAG with knowledge graph and contextual intelligence
Project description
DlightRAG
Multimodal RAG with knowledge graph and contextual intelligence. Understands what your documents say, how concepts connect, and what the pages look like. Production-ready.
Most RAG systems treat documents as hierarchical text and search by similarity agentically — visual context is lost, entity relationships are missed, context filtering is limited. DlightRAG combines knowledge graph understanding with dynamic multimodal retrieval to close these gaps.
From text-heavy reports to chart-filled presentations — it adapts to your documents without information compromise. Inquiry answers come with inline citations grounded in actual document content. Flexibly ship it as a ready-to-run service, integrate into your backend, or expose as a tool for AI agents.
Features
- Dual multimodal RAG modes — Caption mode (parse → caption → embed) for pipeline-based multimodal paradigm; Unified mode (render → multimodal embed) for modern multimodal paradigm
- Knowledge graph + vector + visual retrieval — Multi-strategy retrieval across knowledge graph and vector similarity LightRAG, visual content, and dynamic metadata filters
- Multimodal ingestion — PDF, Images, Office Documents from local filesystem, Azure Blob Storage etc.
- Broad LLM support — Native SDKs for OpenAI, Anthropic, Gemini + any OpenAI-compatible endpoint
- Cross-workspace federation — Query across embedding-compatible workspaces with well managed merging
- Citation and highlighting — Inline citations with source, page, and highlighting attribution
- Observability — Zero-overhead telemetry via Langfuse for tracking pipelines, queries, and generations
- Four interfaces — Web UI, REST API, Python SDK, and MCP server
Architecture
Source: docs/architecture.drawio (runtime data flow) · docs/module-layers.md (code-organisation layers)
Quick Start
Defaults shipped in
config.yaml:unifiedRAG mode +google/gemini-2.5-flash-litechat (via an OpenAI-compatible gateway) +voyage-multimodal-3.5embedding (Voyage). Swap providers or models by editingconfig.yaml— see Configuration.
Web UI
Click the image to watch demo (YouTube)
If you already have the REST API running (via Docker or dlightrag-api), the Web UI is available at:
http://localhost:8100/web/
Without Docker:
uv add dlightrag # or: pip install dlightrag
cp .env.example .env # set API keys in .env
dlightrag-api
Docker (Self-Hosted)
git clone https://github.com/hanlianlu/dlightrag.git && cd dlightrag
cp .env.example .env # set API keys in .env; edit config.yaml for models/providers
docker compose up
Includes PostgreSQL (pgvector + AGE), REST API (:8100), and MCP server (:8101, host-mapped to loopback by default — see Deployment & auth before exposing externally).
Local models (Ollama, Xinference, etc.): use
host.docker.internalinstead oflocalhostinbase_urlsettings.
curl http://localhost:8100/health
curl -X POST http://localhost:8100/ingest \
-H "Content-Type: application/json" \
-d '{"source_type": "local", "path": "/app/dlightrag_storage/sources"}'
curl -X POST http://localhost:8100/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "What are the key findings?"}'
curl -X POST http://localhost:8100/answer \
-H "Content-Type: application/json" \
-d '{"query": "What are the key findings?", "stream": true}'
Python SDK
uv add dlightrag # or: pip install dlightrag
cp .env.example .env # set API keys in .env
import asyncio
from dotenv import load_dotenv
from dlightrag import RAGServiceManager, DlightragConfig
load_dotenv() # load .env
async def main():
config = DlightragConfig()
# Async factory: parallel-warms every workspace and initializes Langfuse tracing.
# Bare `RAGServiceManager(config)` also works but defers warmup until first call.
manager = await RAGServiceManager.create(config)
try:
await manager.aingest(workspace="default", source_type="local", path="./docs")
result = await manager.aretrieve("What are the key findings?")
print(result.contexts)
result = await manager.aanswer("What are the key findings?")
print(result.answer)
finally:
await manager.close()
asyncio.run(main())
Requires PostgreSQL with pgvector + AGE, or JSON fallback for development (see Configuration).
MCP Server (for AI Agents)
Two transports — pick by how the agent runs:
stdio — agent spawns dlightrag-mcp as a subprocess (Claude Desktop, Cursor):
uv tool install dlightrag
cp .env.example .env # set API keys in .env
{
"mcpServers": {
"dlightrag": {
"command": "uvx",
"args": ["dlightrag-mcp", "--env-file", "/absolute/path/to/.env"]
}
}
}
streamable-http — agent connects over HTTP (remote / multi-client):
DLIGHTRAG_MCP_TRANSPORT=streamable-http \
DLIGHTRAG_MCP_HOST=127.0.0.1 \
dlightrag-mcp
# agent posts to http://127.0.0.1:8101/mcp
Tools: retrieve, answer, ingest, list_files, delete_files, list_workspaces — all workspace-isolated.
Deployment & auth
Pick the row matching your use case:
| Scenario | Transport | Bind | Bearer token |
|---|---|---|---|
| Local agent (Claude Desktop / Cursor) | stdio |
n/a | not needed |
| Self-hosted, single-machine | streamable-http |
127.0.0.1 (default) |
not needed |
docker compose up (default) |
streamable-http |
container 0.0.0.0, host port 127.0.0.1:8101 |
not needed |
| LAN / team access | streamable-http |
0.0.0.0 |
required |
| Production / public network | streamable-http behind reverse proxy + TLS |
proxy → 127.0.0.1 |
required |
Rule of thumb: if anyone other than you can reach port 8100 (REST) or 8101 (MCP), set a token.
openssl rand -base64 32 # generate
echo "DLIGHTRAG_API_AUTH_TOKEN=<generated>" >> .env # set
# clients send: Authorization: Bearer <generated>
The same token guards both REST and MCP. The MCP server logs a multi-line warning at startup if it binds non-loopback without a token configured.
API & Internals
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest |
Ingest from local, Azure Blob, or AWS S3 |
POST |
/retrieve |
Contexts + sources, no LLM call (response still ships answer: null for shape parity with /answer) |
POST |
/answer |
LLM answer + contexts + sources (stream: true for SSE) |
GET |
/files |
List ingested documents |
DELETE |
/files |
Delete documents |
GET |
/files/failed |
List documents stuck in DocStatus.FAILED |
POST |
/files/retry |
Re-ingest all FAILED documents (replace=True, source-aware) |
GET |
/api/files/{path} |
Serve/download a file (local: stream, Azure: 302 SAS redirect) |
GET |
/metadata/{doc_id} |
Read a document's metadata JSONB |
POST |
/metadata/{doc_id} |
Merge custom keys into a document's metadata JSONB |
POST |
/metadata/search |
Find document IDs matching a key/value filter dict |
POST |
/reset |
Reset workspace(s) — drop storage, clear indexes |
GET |
/workspaces |
List available workspaces |
GET |
/health |
Health check with storage status |
All write endpoints accept optional workspace; read endpoints accept workspaces list for cross-workspace federated search. See Deployment & auth for token setup.
- Request/response schema —
docs/response-schema.mdfor ingestion parameters, retrieval contexts, sources, media, SSE streaming, citations, and multimodal queries. - Retrieval & answer pipeline —
docs/retrieval_answer_mechanism.mdfor unified vs caption mode, visual resolution, reranking, Step 1+2 merge.
Configuration
Configuration uses a hybrid system — structured app settings in config.yaml, secrets and deployment in .env.
Priority: constructor args > env vars > .env > config.yaml > defaults
See config.yaml for all application settings and .env.example for secrets/deployment reference.
Env var naming: all variables use the
DLIGHTRAG_prefix. Single underscore (_) is part of the field name (e.g.DLIGHTRAG_POSTGRES_HOST→postgres_host). Double underscore (__) means nested object (e.g.DLIGHTRAG_CHAT__MODEL→chat.model). See.env.examplefor details.
RAG Mode
The first decision — determines your ingestion pipeline, model requirements, and retrieval behavior.
| Mode | Pipeline | Best for |
|---|---|---|
caption |
Document parsing → VLM captioning → text embedding → KG | Text-heavy documents, structured elements |
unified (default) |
Page rendering → multimodal embedding → VLM entity extraction → KG | Visually rich documents (charts, diagrams, complex layouts) |
Caption mode parsers (parser in config.yaml):
| Parser | Description |
|---|---|
mineru (default) |
MinerU PDF parser — fast, good for text-heavy documents |
docling |
Docling parser — alternative structure-aware parser |
vlm |
VLM-based OCR — renders pages and uses chat model (must be VLM) to extract structured content; no external parser dependency |
All caption mode parsers use Docling's HybridChunker for structure-aware chunking.
Model usage by stage:
Each stage resolves its model via the per-role overrides below; if a role is unset, it falls back to chat.
| Stage | Caption | Unified | Role override |
|---|---|---|---|
| Image captioning | chat (VLM) | chat (VLM) | vlm |
| Table / equation captioning | chat | — | vlm |
| Entity extraction | chat | chat (VLM) | extract |
| Embedding | embedding model | embedding model (multimodal) | (separate embedding block) |
| Rerank (chat_llm_reranker) | ingest/chat | vlm/chat (pointwise) | vlm |
| Rerank (API strategy) | jina_reranker / aliyun_reranker / azure_cohere / local_reranker | jina_reranker / aliyun_reranker / azure_cohere / local_reranker | (separate rerank block) |
| Keyword extraction (per-query) | chat | chat | keywords |
| Answer generation | chat | chat (VLM, sees text excerpts + page images) | query |
Important: The chat model must support vision (multimodal/VLM). It doubles as the vision model for image captioning, VLM parser, unified mode, and multimodal queries. A text-only chat model will fail on these tasks.
For unified mode, set rag_mode: unified in config.yaml and use multimodal models:
# config.yaml
rag_mode: unified
chat:
model: qwen3-vl-32b # must support vision
embedding:
model: Qwen3-VL-Embedding # must be multimodal
dim: 4096
Limitations: A workspace is locked to one mode after first ingestion. Page images ~3-7 MB/page at 250 DPI.
Providers
Three native SDKs — choose per model block in config.yaml:
| Provider | SDK | Use for |
|---|---|---|
openai (default) |
AsyncOpenAI | OpenAI, Azure OpenAI, Qwen/DashScope, MiniMax, Ollama, Xinference, any OpenAI-compatible endpoint |
anthropic |
Anthropic SDK | Anthropic Claude models |
gemini |
Google GenAI SDK | Google Gemini models |
All three SDKs ship in the base install; no extras to install.
# config.yaml — OpenAI-compatible (Ollama example)
chat:
provider: openai
model: qwen3:8b
base_url: http://localhost:11434/v1
# config.yaml — Anthropic (native SDK)
chat:
provider: anthropic
model: claude-sonnet-4-20250514
# config.yaml — Google Gemini (native SDK)
chat:
provider: gemini
model: gemini-2.5-pro
API keys go in .env:
DLIGHTRAG_CHAT__API_KEY=sk-...
DLIGHTRAG_EMBEDDING__API_KEY=sk-...
Per-role LLM overrides (optional)
Built on LightRAG 1.5.0's role registry. Each role falls back to chat
when not specified — start with chat only, split out a role later when
cost or quality needs it.
| Role | What it drives | Recommended model class |
|---|---|---|
extract |
KG entity & relation extraction during ingest | Heavy reasoning (Claude Sonnet / GPT-5) |
keywords |
Per-query keyword extraction | Cheap & fast (Haiku / Gemini Flash Lite) |
query |
Answer generation + retrieval planning | Balanced–heavy (Claude Opus / GPT-5) |
vlm |
DlightRAG vision paths: VLM OCR, multimodal query, unified extractor | Vision-strong (GPT-5-vision / Gemini 2.5 Flash) |
# config.yaml
extract:
provider: anthropic
model: claude-sonnet-4-20250514
# Cheap local fallback for high-volume keyword extraction:
keywords:
provider: openai
model: gemma4:9b-it-q4_K_M
base_url: http://host.docker.internal:11434/v1
api_key: ollama
Storage Backends
Set in config.yaml:
| Setting | Default | Options |
|---|---|---|
vector_storage |
PGVectorStorage |
PGVectorStorage, MilvusVectorDBStorage, NanoVectorDBStorage, ... |
graph_storage |
PGGraphStorage |
PGGraphStorage, Neo4JStorage, NetworkXStorage, ... |
kv_storage |
PGKVStorage |
PGKVStorage, JsonKVStorage, RedisKVStorage, ... |
doc_status_storage |
PGDocStatusStorage |
PGDocStatusStorage, JsonDocStatusStorage, ... |
Note: When using PostgreSQL backends, LightRAG maps its internal namespace names to different table names (e.g.
text_chunks→LIGHTRAG_DOC_CHUNKS,full_docs→LIGHTRAG_DOC_FULL). DlightRAG's unified mode adds avisual_chunkstable via its own KV storage.
Workspaces
Each workspace has its own knowledge graph, vector store, and document index. workspace in config.yaml (default: default) is automatically bridged to backend-specific env vars — no manual setup needed.
| Backend type | Isolation mechanism |
|---|---|
| PostgreSQL (PG*) | workspace column / graph name in same database |
| Neo4j / Memgraph | Label prefix |
| Milvus / Qdrant | Collection prefix |
| MongoDB / Redis | Collection scope |
| JSON / Nano / NetworkX / Faiss | Subdirectory under working_dir/<workspace>/ |
Reranking
Set in config.yaml under the rerank: block:
| Setting | Default | Description |
|---|---|---|
rerank.strategy |
chat_llm_reranker |
chat_llm_reranker, jina_reranker, aliyun_reranker, azure_cohere, local_reranker |
rerank.model |
(strategy default) | Model name sent to the endpoint |
rerank.base_url |
(provider default) | Custom endpoint URL for any compatible service |
rerank.api_key |
— | Set in .env as DLIGHTRAG_RERANK__API_KEY |
| Strategy | Default model | API key |
|---|---|---|
chat_llm_reranker |
falls through vlm → ingest → chat role |
(reuses the chosen role's key) |
jina_reranker |
jina-reranker-m0 |
DLIGHTRAG_RERANK__API_KEY |
aliyun_reranker |
gte-rerank |
DLIGHTRAG_RERANK__API_KEY |
azure_cohere |
cohere-rerank-v3.5 |
DLIGHTRAG_RERANK__API_KEY |
local_reranker |
(set rerank.model + rerank.base_url) |
(none — local endpoint) |
For self-hosted rerankers (Xinference, vLLM, TEI etc.), use local_reranker with rerank.base_url + rerank.model. For any other OpenAI-compatible /rerank endpoint, point rerank.base_url at it.
Observability (Langfuse)
DlightRAG includes native, zero-overhead tracing using Langfuse. When configured, you get detailed waterfall traces of every RAG pipeline stage, LLM generation, and embedding call. If keys are omitted, the tracing module operates as a pure no-op with zero performance penalty.
To enable observability, set the following in your .env:
DLIGHTRAG_LANGFUSE_PUBLIC_KEY=pk-...
DLIGHTRAG_LANGFUSE_SECRET_KEY=sk-...
# DLIGHTRAG_LANGFUSE_HOST=https://cloud.langfuse.com # Optional: defaults to cloud
This will automatically track retrieve, answer, and ingest operations at the service level.
Development
git clone https://github.com/hanlianlu/dlightrag.git && cd dlightrag
cp .env.example .env && uv sync
docker compose up -d # PostgreSQL + API + MCP
docker compose up postgres -d # PostgreSQL only
uv run pytest tests/unit # unit tests (no external services)
uv run pytest tests/integration # integration tests (requires PostgreSQL)
uv run ruff check src/ tests/ scripts/ --fix && uv run ruff format src/ tests/ scripts/
License
Apache License 2.0 — see LICENSE.
Built by HanlianLyu. Contributions welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dlightrag-1.3.6.tar.gz.
File metadata
- Download URL: dlightrag-1.3.6.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e92fe0c1f751bf2c6812df13f02d10928399052644315924d6fad0e0a99056d
|
|
| MD5 |
9122b873d64298b8dc34d40aa1bdaec4
|
|
| BLAKE2b-256 |
d77e8af54fb1cf1e459fda4c5e567ccaa95315ed880355e9d35f49c205958d79
|
Provenance
The following attestation bundles were made for dlightrag-1.3.6.tar.gz:
Publisher:
publish.yml on hanlianlu/DlightRAG
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dlightrag-1.3.6.tar.gz -
Subject digest:
1e92fe0c1f751bf2c6812df13f02d10928399052644315924d6fad0e0a99056d - Sigstore transparency entry: 1409548951
- Sigstore integration time:
-
Permalink:
hanlianlu/DlightRAG@7f91b0e7230819357484c38e2e457b3b520ba7e7 -
Branch / Tag:
refs/tags/v1.3.6 - Owner: https://github.com/hanlianlu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7f91b0e7230819357484c38e2e457b3b520ba7e7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dlightrag-1.3.6-py3-none-any.whl.
File metadata
- Download URL: dlightrag-1.3.6-py3-none-any.whl
- Upload date:
- Size: 244.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d41be9860641fa7a2a8bcf26ca6f413e15e840f7f74c9c08aa7cde88903eba5
|
|
| MD5 |
a8effcaa5c2bf9f2298844d547bdc58a
|
|
| BLAKE2b-256 |
0d82ddc21d085f62985b8cca158c8d1606ace29f8caee83c326b93166d9b9fca
|
Provenance
The following attestation bundles were made for dlightrag-1.3.6-py3-none-any.whl:
Publisher:
publish.yml on hanlianlu/DlightRAG
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dlightrag-1.3.6-py3-none-any.whl -
Subject digest:
9d41be9860641fa7a2a8bcf26ca6f413e15e840f7f74c9c08aa7cde88903eba5 - Sigstore transparency entry: 1409549017
- Sigstore integration time:
-
Permalink:
hanlianlu/DlightRAG@7f91b0e7230819357484c38e2e457b3b520ba7e7 -
Branch / Tag:
refs/tags/v1.3.6 - Owner: https://github.com/hanlianlu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7f91b0e7230819357484c38e2e457b3b520ba7e7 -
Trigger Event:
push
-
Statement type: