Skip to main content

Multimodal RAG with knowledge graph and contextual intelligence

Project description

DlightRAG

PyPI CI Ask DeepWiki

Multimodal RAG with knowledge graph and contextual intelligence. Understands what your documents say, how concepts connect, and what the pages look like. Production-ready.

Most RAG systems treat documents as hierarchical text and search by similarity agentically — visual context is lost, entity relationships are missed, context filtering is limited. DlightRAG combines knowledge graph understanding with dynamic multimodal retrieval to close these gaps.

From text-heavy reports to chart-filled presentations — it adapts to your documents without information compromise. Inquiry answers come with inline citations grounded in actual document content. Flexibly ship it as a ready-to-run service, integrate into your backend, or expose as a tool for AI agents.

Features

  • Dual multimodal RAG modes — Caption mode (parse → caption → embed) for pipeline-based multimodal paradigm; Unified mode (render → multimodal embed) for modern multimodal paradigm
  • Knowledge graph + vector + visual retrieval — Multi-strategy retrieval across knowledge graph and vector similarity LightRAG, visual content, and dynamic metadata filters
  • Multimodal ingestion — PDF, Images, Office Documents from local filesystem, Azure Blob Storage etc.
  • Broad LLM support — Native SDKs for OpenAI, Anthropic, Gemini + any OpenAI-compatible endpoint
  • Cross-workspace federation — Query across embedding-compatible workspaces with well managed merging
  • Citation and highlighting — Inline citations with source, page, and highlighting attribution
  • Observability — Zero-overhead telemetry via Langfuse for tracking pipelines, queries, and generations
  • Four interfaces — Web UI, REST API, Python SDK, and MCP server

Architecture

DlightRAG Architecture

Source: docs/architecture.drawio (runtime data flow) · docs/module-layers.md (code-organisation layers)

Quick Start

Defaults shipped in config.yaml: unified RAG mode + google/gemini-2.5-flash-lite chat (via an OpenAI-compatible gateway) + voyage-multimodal-3.5 embedding (Voyage). Swap providers or models by editing config.yaml — see Configuration.

Web UI

Click the image to watch demo (YouTube)
Watch Demo on YouTube

If you already have the REST API running (via Docker or dlightrag-api), the Web UI is available at:

http://localhost:8100/web/

Without Docker:

uv add dlightrag        # or: pip install dlightrag
cp .env.example .env    # set API keys in .env
dlightrag-api

Docker (Self-Hosted)

git clone https://github.com/hanlianlu/dlightrag.git && cd dlightrag
cp .env.example .env    # set API keys in .env; edit config.yaml for models/providers
docker compose up

Includes PostgreSQL (pgvector + AGE), REST API (:8100), and MCP server (:8101, host-mapped to loopback by default — see Deployment & auth before exposing externally).

Local models (Ollama, Xinference, etc.): use host.docker.internal instead of localhost in base_url settings.

curl http://localhost:8100/health

curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{"source_type": "local", "path": "/app/dlightrag_storage/sources"}'

curl -X POST http://localhost:8100/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key findings?"}'

curl -X POST http://localhost:8100/answer \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key findings?", "stream": true}'

Python SDK

uv add dlightrag        # or: pip install dlightrag
cp .env.example .env    # set API keys in .env
import asyncio
from dotenv import load_dotenv
from dlightrag import RAGServiceManager, DlightragConfig

load_dotenv()  # load .env

async def main():
    config = DlightragConfig()
    # Async factory: parallel-warms every workspace and initializes Langfuse tracing.
    # Bare `RAGServiceManager(config)` also works but defers warmup until first call.
    manager = await RAGServiceManager.create(config)
    try:
        await manager.aingest(workspace="default", source_type="local", path="./docs")

        result = await manager.aretrieve("What are the key findings?")
        print(result.contexts)

        result = await manager.aanswer("What are the key findings?")
        print(result.answer)
    finally:
        await manager.close()

asyncio.run(main())

Requires PostgreSQL with pgvector + AGE, or JSON fallback for development (see Configuration).

MCP Server (for AI Agents)

Two transports — pick by how the agent runs:

stdio — agent spawns dlightrag-mcp as a subprocess (Claude Desktop, Cursor):

uv tool install dlightrag
cp .env.example .env        # set API keys in .env
{
  "mcpServers": {
    "dlightrag": {
      "command": "uvx",
      "args": ["dlightrag-mcp", "--env-file", "/absolute/path/to/.env"]
    }
  }
}

streamable-http — agent connects over HTTP (remote / multi-client):

DLIGHTRAG_MCP_TRANSPORT=streamable-http \
DLIGHTRAG_MCP_HOST=127.0.0.1 \
dlightrag-mcp
# agent posts to http://127.0.0.1:8101/mcp

Tools: retrieve, answer, ingest, list_files, delete_files, list_workspaces — all workspace-isolated.

Deployment & auth

Pick the row matching your use case:

Scenario Transport Bind Bearer token
Local agent (Claude Desktop / Cursor) stdio n/a not needed
Self-hosted, single-machine streamable-http 127.0.0.1 (default) not needed
docker compose up (default) streamable-http container 0.0.0.0, host port 127.0.0.1:8101 not needed
LAN / team access streamable-http 0.0.0.0 required
Production / public network streamable-http behind reverse proxy + TLS proxy → 127.0.0.1 required

Rule of thumb: if anyone other than you can reach port 8100 (REST) or 8101 (MCP), set a token.

openssl rand -base64 32                                     # generate
echo "DLIGHTRAG_API_AUTH_TOKEN=<generated>" >> .env         # set
# clients send: Authorization: Bearer <generated>

The same token guards both REST and MCP. The MCP server logs a multi-line warning at startup if it binds non-loopback without a token configured.

API & Internals

Method Endpoint Description
POST /ingest Ingest from local, Azure Blob, or AWS S3
POST /retrieve Contexts + sources, no LLM call (response still ships answer: null for shape parity with /answer)
POST /answer LLM answer + contexts + sources (stream: true for SSE)
GET /files List ingested documents
DELETE /files Delete documents
GET /files/failed List documents stuck in DocStatus.FAILED
POST /files/retry Re-ingest all FAILED documents (replace=True, source-aware)
GET /api/files/{path} Serve/download a file (local: stream, Azure: 302 SAS redirect)
GET /metadata/{doc_id} Read a document's metadata JSONB
POST /metadata/{doc_id} Merge custom keys into a document's metadata JSONB
POST /metadata/search Find document IDs matching a key/value filter dict
POST /reset Reset workspace(s) — drop storage, clear indexes
GET /workspaces List available workspaces
GET /health Health check with storage status

All write endpoints accept optional workspace; read endpoints accept workspaces list for cross-workspace federated search. See Deployment & auth for token setup.

  • Request/response schemadocs/response-schema.md for ingestion parameters, retrieval contexts, sources, media, SSE streaming, citations, and multimodal queries.
  • Retrieval & answer pipelinedocs/retrieval_answer_mechanism.md for unified vs caption mode, visual resolution, reranking, Step 1+2 merge.

Configuration

Configuration uses a hybrid system — structured app settings in config.yaml, secrets and deployment in .env.

Priority: constructor args > env vars > .env > config.yaml > defaults

See config.yaml for all application settings and .env.example for secrets/deployment reference.

Env var naming: all variables use the DLIGHTRAG_ prefix. Single underscore (_) is part of the field name (e.g. DLIGHTRAG_POSTGRES_HOSTpostgres_host). Double underscore (__) means nested object (e.g. DLIGHTRAG_CHAT__MODELchat.model). See .env.example for details.

RAG Mode

The first decision — determines your ingestion pipeline, model requirements, and retrieval behavior.

Mode Pipeline Best for
caption Document parsing → VLM captioning → text embedding → KG Text-heavy documents, structured elements
unified (default) Page rendering → multimodal embedding → VLM entity extraction → KG Visually rich documents (charts, diagrams, complex layouts)

Caption mode parsers (parser in config.yaml):

Parser Description
mineru (default) MinerU PDF parser — fast, good for text-heavy documents
docling Docling parser — alternative structure-aware parser
vlm VLM-based OCR — renders pages and uses chat model (must be VLM) to extract structured content; no external parser dependency

All caption mode parsers use Docling's HybridChunker for structure-aware chunking.

Model usage by stage:

Each stage resolves its model via the per-role overrides below; if a role is unset, it falls back to chat.

Stage Caption Unified Role override
Image captioning chat (VLM) chat (VLM) vlm
Table / equation captioning chat vlm
Entity extraction chat chat (VLM) extract
Embedding embedding model embedding model (multimodal) (separate embedding block)
Rerank (chat_llm_reranker) ingest/chat vlm/chat (pointwise) vlm
Rerank (API strategy) jina_reranker / aliyun_reranker / azure_cohere / local_reranker jina_reranker / aliyun_reranker / azure_cohere / local_reranker (separate rerank block)
Keyword extraction (per-query) chat chat keywords
Answer generation chat chat (VLM, sees text excerpts + page images) query

Important: The chat model must support vision (multimodal/VLM). It doubles as the vision model for image captioning, VLM parser, unified mode, and multimodal queries. A text-only chat model will fail on these tasks.

For unified mode, set rag_mode: unified in config.yaml and use multimodal models:

# config.yaml
rag_mode: unified

chat:
  model: qwen3-vl-32b          # must support vision

embedding:
  model: Qwen3-VL-Embedding    # must be multimodal
  dim: 4096

Limitations: A workspace is locked to one mode after first ingestion. Page images ~3-7 MB/page at 250 DPI.

Providers

Three native SDKs — choose per model block in config.yaml:

Provider SDK Use for
openai (default) AsyncOpenAI OpenAI, Azure OpenAI, Qwen/DashScope, MiniMax, Ollama, Xinference, any OpenAI-compatible endpoint
anthropic Anthropic SDK Anthropic Claude models
gemini Google GenAI SDK Google Gemini models

All three SDKs ship in the base install; no extras to install.

# config.yaml — OpenAI-compatible (Ollama example)
chat:
  provider: openai
  model: qwen3:8b
  base_url: http://localhost:11434/v1

# config.yaml — Anthropic (native SDK)
chat:
  provider: anthropic
  model: claude-sonnet-4-20250514

# config.yaml — Google Gemini (native SDK)
chat:
  provider: gemini
  model: gemini-2.5-pro

API keys go in .env:

DLIGHTRAG_CHAT__API_KEY=sk-...
DLIGHTRAG_EMBEDDING__API_KEY=sk-...

Per-role LLM overrides (optional)

Built on LightRAG 1.5.0's role registry. Each role falls back to chat when not specified — start with chat only, split out a role later when cost or quality needs it.

Role What it drives Recommended model class
extract KG entity & relation extraction during ingest Heavy reasoning (Claude Sonnet / GPT-5)
keywords Per-query keyword extraction Cheap & fast (Haiku / Gemini Flash Lite)
query Answer generation + retrieval planning Balanced–heavy (Claude Opus / GPT-5)
vlm DlightRAG vision paths: VLM OCR, multimodal query, unified extractor Vision-strong (GPT-5-vision / Gemini 2.5 Flash)
# config.yaml
extract:
  provider: anthropic
  model: claude-sonnet-4-20250514

# Cheap local fallback for high-volume keyword extraction:
keywords:
  provider: openai
  model: gemma4:9b-it-q4_K_M
  base_url: http://host.docker.internal:11434/v1
  api_key: ollama

Storage Backends

Set in config.yaml:

Setting Default Options
vector_storage PGVectorStorage PGVectorStorage, MilvusVectorDBStorage, NanoVectorDBStorage, ...
graph_storage PGGraphStorage PGGraphStorage, Neo4JStorage, NetworkXStorage, ...
kv_storage PGKVStorage PGKVStorage, JsonKVStorage, RedisKVStorage, ...
doc_status_storage PGDocStatusStorage PGDocStatusStorage, JsonDocStatusStorage, ...

Note: When using PostgreSQL backends, LightRAG maps its internal namespace names to different table names (e.g. text_chunksLIGHTRAG_DOC_CHUNKS, full_docsLIGHTRAG_DOC_FULL). DlightRAG's unified mode adds a visual_chunks table via its own KV storage.

Workspaces

Each workspace has its own knowledge graph, vector store, and document index. workspace in config.yaml (default: default) is automatically bridged to backend-specific env vars — no manual setup needed.

Backend type Isolation mechanism
PostgreSQL (PG*) workspace column / graph name in same database
Neo4j / Memgraph Label prefix
Milvus / Qdrant Collection prefix
MongoDB / Redis Collection scope
JSON / Nano / NetworkX / Faiss Subdirectory under working_dir/<workspace>/

Reranking

Set in config.yaml under the rerank: block:

Setting Default Description
rerank.strategy chat_llm_reranker chat_llm_reranker, jina_reranker, aliyun_reranker, azure_cohere, local_reranker
rerank.model (strategy default) Model name sent to the endpoint
rerank.base_url (provider default) Custom endpoint URL for any compatible service
rerank.api_key Set in .env as DLIGHTRAG_RERANK__API_KEY
Strategy Default model API key
chat_llm_reranker falls through vlmingestchat role (reuses the chosen role's key)
jina_reranker jina-reranker-m0 DLIGHTRAG_RERANK__API_KEY
aliyun_reranker gte-rerank DLIGHTRAG_RERANK__API_KEY
azure_cohere cohere-rerank-v3.5 DLIGHTRAG_RERANK__API_KEY
local_reranker (set rerank.model + rerank.base_url) (none — local endpoint)

For self-hosted rerankers (Xinference, vLLM, TEI etc.), use local_reranker with rerank.base_url + rerank.model. For any other OpenAI-compatible /rerank endpoint, point rerank.base_url at it.

Observability (Langfuse)

DlightRAG includes native, zero-overhead tracing using Langfuse. When configured, you get detailed waterfall traces of every RAG pipeline stage, LLM generation, and embedding call. If keys are omitted, the tracing module operates as a pure no-op with zero performance penalty.

To enable observability, set the following in your .env:

DLIGHTRAG_LANGFUSE_PUBLIC_KEY=pk-...
DLIGHTRAG_LANGFUSE_SECRET_KEY=sk-...
# DLIGHTRAG_LANGFUSE_HOST=https://cloud.langfuse.com  # Optional: defaults to cloud

This will automatically track retrieve, answer, and ingest operations at the service level.

Development

git clone https://github.com/hanlianlu/dlightrag.git && cd dlightrag
cp .env.example .env && uv sync
docker compose up -d                # PostgreSQL + API + MCP
docker compose up postgres -d       # PostgreSQL only
uv run pytest tests/unit            # unit tests (no external services)
uv run pytest tests/integration     # integration tests (requires PostgreSQL)
uv run ruff check src/ tests/ scripts/ --fix && uv run ruff format src/ tests/ scripts/

License

Apache License 2.0 — see LICENSE.


Built by HanlianLyu. Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlightrag-1.3.6.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dlightrag-1.3.6-py3-none-any.whl (244.5 kB view details)

Uploaded Python 3

File details

Details for the file dlightrag-1.3.6.tar.gz.

File metadata

  • Download URL: dlightrag-1.3.6.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dlightrag-1.3.6.tar.gz
Algorithm Hash digest
SHA256 1e92fe0c1f751bf2c6812df13f02d10928399052644315924d6fad0e0a99056d
MD5 9122b873d64298b8dc34d40aa1bdaec4
BLAKE2b-256 d77e8af54fb1cf1e459fda4c5e567ccaa95315ed880355e9d35f49c205958d79

See more details on using hashes here.

Provenance

The following attestation bundles were made for dlightrag-1.3.6.tar.gz:

Publisher: publish.yml on hanlianlu/DlightRAG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dlightrag-1.3.6-py3-none-any.whl.

File metadata

  • Download URL: dlightrag-1.3.6-py3-none-any.whl
  • Upload date:
  • Size: 244.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dlightrag-1.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9d41be9860641fa7a2a8bcf26ca6f413e15e840f7f74c9c08aa7cde88903eba5
MD5 a8effcaa5c2bf9f2298844d547bdc58a
BLAKE2b-256 0d82ddc21d085f62985b8cca158c8d1606ace29f8caee83c326b93166d9b9fca

See more details on using hashes here.

Provenance

The following attestation bundles were made for dlightrag-1.3.6-py3-none-any.whl:

Publisher: publish.yml on hanlianlu/DlightRAG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page