Skip to main content

DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)

Project description

DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

Features

  • Dual-Layer Graph: Combines visual layout awareness with semantic entity relationships.
  • Visual-Grounded Retrieval: Retrieves not just text, but visual regions and their spatial context.
  • Robust OCR: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
  • Advanced NER: Uses GLiNER for zero-shot entity recognition.
  • Flexible LLM Support: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

Installation

Standard Installation

pip install deeplightrag

With GPU Support (NVIDIA CUDA)

For optimized performance using quantization (4-bit/8-bit):

pip install "deeplightrag[gpu]"

For macOS (Apple Silicon)

For optimization on M1/M2/M3 chips:

pip install "deeplightrag[macos]"

Usage

Command Line Interface

Index a document:

# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml

Retrieve information:

deeplightrag retrieve "What is the main topic?" --config config.yaml

Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true

Python API

from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplightrag-1.0.18.tar.gz (136.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplightrag-1.0.18-py3-none-any.whl (147.7 kB view details)

Uploaded Python 3

File details

Details for the file deeplightrag-1.0.18.tar.gz.

File metadata

  • Download URL: deeplightrag-1.0.18.tar.gz
  • Upload date:
  • Size: 136.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.18.tar.gz
Algorithm Hash digest
SHA256 a2f215a84e146358673e4e429c84d0443bf9f4220eaddd1c7ad55aed496d3088
MD5 90485273dad3f58cb18def9d14e58b00
BLAKE2b-256 d7e3d5d6b87d359772e89746ee2cf857da408c625480d99b2cbe942a0cf7b914

See more details on using hashes here.

File details

Details for the file deeplightrag-1.0.18-py3-none-any.whl.

File metadata

  • Download URL: deeplightrag-1.0.18-py3-none-any.whl
  • Upload date:
  • Size: 147.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for deeplightrag-1.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 d945c3ce0e3a45c30a84e493d3c1f51e8f47cdb4134e1a5cb705bc262dfd883f
MD5 33fa4682c3537e3f4b2bf5e3c236bed9
BLAKE2b-256 b8fc873ada6e2965aad4eed0bc15f3dfe1ef1d80294280511bd27eb5febc1656

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page