Skip to main content

DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)

Project description

DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

Features

  • Dual-Layer Graph: Combines visual layout awareness with semantic entity relationships.
  • Visual-Grounded Retrieval: Retrieves not just text, but visual regions and their spatial context.
  • Robust OCR: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
  • Advanced NER: Uses GLiNER for zero-shot entity recognition.
  • Flexible LLM Support: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

Installation

Standard Installation

pip install deeplightrag

With GPU Support (NVIDIA CUDA)

For optimized performance using quantization (4-bit/8-bit):

pip install "deeplightrag[gpu]"

For macOS (Apple Silicon)

For optimization on M1/M2/M3 chips:

pip install "deeplightrag[macos]"

Usage

Command Line Interface

Index a document:

# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml

Retrieve information:

deeplightrag retrieve "What is the main topic?" --config config.yaml

Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true

Python API

from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplightrag-1.0.15.tar.gz (136.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplightrag-1.0.15-py3-none-any.whl (147.5 kB view details)

Uploaded Python 3

File details

Details for the file deeplightrag-1.0.15.tar.gz.

File metadata

  • Download URL: deeplightrag-1.0.15.tar.gz
  • Upload date:
  • Size: 136.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.15.tar.gz
Algorithm Hash digest
SHA256 0c599231826e35b9776f354cd9a7000268c3002b9986f1edf2378bdf3f57757e
MD5 ad278f15cac2e43d0642a8a04e4bee55
BLAKE2b-256 860b5d34d4a9549eb2665f160eb09ef7335cf7137be17e46aa8ff24f132a53dd

See more details on using hashes here.

File details

Details for the file deeplightrag-1.0.15-py3-none-any.whl.

File metadata

  • Download URL: deeplightrag-1.0.15-py3-none-any.whl
  • Upload date:
  • Size: 147.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 a886e97684d6b34d6c81b28316d04d7993ec9f7838cb1754a15870ef2b2897af
MD5 c1f6d4b7bdcedf16921dff3b2f7900d5
BLAKE2b-256 e2d28bfb23c77123a4ef7e8bb6e8d21c18833d9afc175a67309f923d49cfb565

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page