Skip to main content

DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)

Project description

DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

Features

  • Dual-Layer Graph: Combines visual layout awareness with semantic entity relationships.
  • Visual-Grounded Retrieval: Retrieves not just text, but visual regions and their spatial context.
  • Robust OCR: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
  • Advanced NER: Uses GLiNER for zero-shot entity recognition.
  • Flexible LLM Support: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

Installation

Standard Installation

pip install deeplightrag

With GPU Support (NVIDIA CUDA)

For optimized performance using quantization (4-bit/8-bit):

pip install "deeplightrag[gpu]"

For macOS (Apple Silicon)

For optimization on M1/M2/M3 chips:

pip install "deeplightrag[macos]"

Usage

Command Line Interface

Index a document:

# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml

Retrieve information:

deeplightrag retrieve "What is the main topic?" --config config.yaml

Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true

Python API

from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplightrag-1.0.17.tar.gz (136.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplightrag-1.0.17-py3-none-any.whl (147.5 kB view details)

Uploaded Python 3

File details

Details for the file deeplightrag-1.0.17.tar.gz.

File metadata

  • Download URL: deeplightrag-1.0.17.tar.gz
  • Upload date:
  • Size: 136.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.17.tar.gz
Algorithm Hash digest
SHA256 bd78e1d8716ba0a3afe836f3c1b7dd5c7f49d9cd4a141b141ec54f97e80eaac9
MD5 1ccd70ac271752049ad4158e530d8271
BLAKE2b-256 bc191bf3a9938e6884e3206941393ba630cf698b0177a3e395e4f4a619944bbb

See more details on using hashes here.

File details

Details for the file deeplightrag-1.0.17-py3-none-any.whl.

File metadata

  • Download URL: deeplightrag-1.0.17-py3-none-any.whl
  • Upload date:
  • Size: 147.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 61d7f6445957037537ef52cbf96c6755ce43874fa71eec54ce22b2454d115b05
MD5 7515eb8959b9131345893844a6a67b89
BLAKE2b-256 ed1c4d19ab247f1137a2819715f5182f3a559951981c33be14200ae4e1af6e71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page