Skip to main content

DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)

Project description

DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

Features

  • Dual-Layer Graph: Combines visual layout awareness with semantic entity relationships.
  • Visual-Grounded Retrieval: Retrieves not just text, but visual regions and their spatial context.
  • Robust OCR: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
  • Advanced NER: Uses GLiNER for zero-shot entity recognition.
  • Flexible LLM Support: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

Installation

Standard Installation

pip install deeplightrag

With GPU Support (NVIDIA CUDA)

For optimized performance using quantization (4-bit/8-bit):

pip install "deeplightrag[gpu]"

For macOS (Apple Silicon)

For optimization on M1/M2/M3 chips:

pip install "deeplightrag[macos]"

Usage

Command Line Interface

Index a document:

# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml

Retrieve information:

deeplightrag retrieve "What is the main topic?" --config config.yaml

Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true

Python API

from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplightrag-1.0.20.tar.gz (98.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplightrag-1.0.20-py3-none-any.whl (105.6 kB view details)

Uploaded Python 3

File details

Details for the file deeplightrag-1.0.20.tar.gz.

File metadata

  • Download URL: deeplightrag-1.0.20.tar.gz
  • Upload date:
  • Size: 98.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for deeplightrag-1.0.20.tar.gz
Algorithm Hash digest
SHA256 0fe947a99e047982826999a9493b7ea399ec3d52cd909884f6cf8f7f57a07bf5
MD5 7e6a98a760053521c02d8e276049009c
BLAKE2b-256 3e51fb0949b736a923d9e79177739fb8fd7295c3d9165c42afe318f365ef4df4

See more details on using hashes here.

File details

Details for the file deeplightrag-1.0.20-py3-none-any.whl.

File metadata

  • Download URL: deeplightrag-1.0.20-py3-none-any.whl
  • Upload date:
  • Size: 105.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for deeplightrag-1.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 45f9b5be9481c826ec351fdd22cf7b79a5e42502896434fce631b44ab77c54c0
MD5 6043683489e79dd76eb53407b26f52cd
BLAKE2b-256 0c87bef5e485071822e38c9bc2d043bec2571edb560f12cdc136cffa0df931f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page