Skip to main content

Docling → Chroma → Ollama: Simple RAG pipeline

Project description

📄 DocRAG LLM

Docling → Chroma → Ollama: Simple Local RAG Pipeline

PyPI
Python
License


🔎 What is DocRAG LLM?

DocRAG LLM is a local-first Retrieval-Augmented Generation (RAG) pipeline.
It connects Docling for parsing → ChromaDB for vector storage → Ollama for local LLM inference.

No cloud lock-in. No API costs. Just local docs → local vectors → local LLMs.


✨ Features

  • 🔍 Parse documents with Docling (PDF, DOCX, PPTX, HTML, etc.)
  • 📑 Intelligent chunking for retrieval
  • 🧠 Store embeddings in ChromaDB
  • 🤖 Answer questions using Ollama (default: llama3.2:1b)
  • 🛡️ Privacy-first → all local execution
  • 🖥️ Use as a CLI tool or Python library

📦 Installation

pip install docrag-llm

Requirements:

  • Python 3.10+
  • Ollama installed & running
  • Local models:
    ollama pull llama3.2:1b
    ollama pull nomic-embed-text
    

🚀 Quickstart

CLI – Ingest and Ask

# Ingest a document (default collection: demo)
python -m docrag.cli ingest https://arxiv.org/pdf/2508.20755

# Ask a question (default LLM: llama3.2:1b)
python -m docrag.cli ask "Summarize in 1 paragraph with 5 bullet points"

Python API

from docrag import DocragSettings, RAGPipeline

cfg = DocragSettings(
    persist_path="./.chroma",
    collection="demo",
    embed_model="nomic-embed-text",
    llm_model="llama3.2:1b",
)

pipeline = RAGPipeline(cfg)

# Ingest
n_chunks = pipeline.ingest("https://arxiv.org/pdf/2508.20755")
print(f"Ingested {n_chunks} chunks")

# Ask
answer = pipeline.ask("Give a concise bullet summary of the paper's contributions.")
print(answer)

⚙️ Configuration

Both CLI & Python API let you customize:

  • persist_path → where ChromaDB stores vectors
  • collection → logical collection name
  • embed_model → embedding model (Ollama tag)
  • llm_model → LLM model (default: llama3.2:1b)
  • chunk_chars / chunk_overlap → chunking granularity

📊 Roadmap

  • model-check CLI → list installed Ollama models
  • Support multiple backends (Weaviate, Milvus)
  • Streaming output for long answers
  • Expanded test suite (large document regression cases)
  • Example notebooks & Hugging Face demo

🤝 Contributing

PRs and issues welcome!

pip install "docrag-llm[dev]"
ruff check .
pytest

📜 License

MIT License © 2025 Armando Medina


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docrag_llm-0.1.25.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docrag_llm-0.1.25-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file docrag_llm-0.1.25.tar.gz.

File metadata

  • Download URL: docrag_llm-0.1.25.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docrag_llm-0.1.25.tar.gz
Algorithm Hash digest
SHA256 856bf7f9e37dd0a341ab4df61bb13bf9b5fde7eaf596d48ae2c3ff7df04d56fc
MD5 ffd00b36b6be3e945f8b41b124cdfdf9
BLAKE2b-256 04389902288b80853e72d5bae097174051e90471356219273b38531b95440283

See more details on using hashes here.

File details

Details for the file docrag_llm-0.1.25-py3-none-any.whl.

File metadata

  • Download URL: docrag_llm-0.1.25-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docrag_llm-0.1.25-py3-none-any.whl
Algorithm Hash digest
SHA256 5c63214c61d573909b49349a24dcba3359efa083116ad9d8c0c5163832ef6cc4
MD5 05801120f56ec11f74c6997b3ae4789f
BLAKE2b-256 f4d9430a3a708d602c6d8e6b82e3bd421fdd1e4f7557a2220a60e79941dbb9e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page