Skip to main content

DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)

Project description

DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

Features

  • Dual-Layer Graph: Combines visual layout awareness with semantic entity relationships.
  • Visual-Grounded Retrieval: Retrieves not just text, but visual regions and their spatial context.
  • Robust OCR: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
  • Advanced NER: Uses GLiNER for zero-shot entity recognition.
  • Flexible LLM Support: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

Installation

Standard Installation

pip install deeplightrag

With GPU Support (NVIDIA CUDA)

For optimized performance using quantization (4-bit/8-bit):

pip install "deeplightrag[gpu]"

For macOS (Apple Silicon)

For optimization on M1/M2/M3 chips:

pip install "deeplightrag[macos]"

Usage

Command Line Interface

Index a document:

# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml

Retrieve information:

deeplightrag retrieve "What is the main topic?" --config config.yaml

Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true

Python API

from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplightrag-1.0.16.tar.gz (136.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplightrag-1.0.16-py3-none-any.whl (147.5 kB view details)

Uploaded Python 3

File details

Details for the file deeplightrag-1.0.16.tar.gz.

File metadata

  • Download URL: deeplightrag-1.0.16.tar.gz
  • Upload date:
  • Size: 136.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.16.tar.gz
Algorithm Hash digest
SHA256 f5072fe23db522e3a46d9492189738a8331f0953c0ab3afcc8f9cbb672ce574a
MD5 63f47b64000dcc1a5ef4dcffdcb05e3c
BLAKE2b-256 8538ef158a9722852afa7427aa3140b1ae2c7c7fc1246fecb34ecb3cb9919973

See more details on using hashes here.

File details

Details for the file deeplightrag-1.0.16-py3-none-any.whl.

File metadata

  • Download URL: deeplightrag-1.0.16-py3-none-any.whl
  • Upload date:
  • Size: 147.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 43650544c2c523e958108c6d9c313a13bd8648aba6d351a38b665c95031d0a33
MD5 a5161290c19c0bd75f9df8f9ac6abbba
BLAKE2b-256 187a6d74f26c321f00b30f0fce51cb06f8b201fa7ec1e5c7f250ae9671d2ec90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page