Skip to main content

Docling → Chroma → Ollama: Simple RAG pipeline

Project description

📄 DocRAG LLM

Docling → Chroma → Ollama: Simple Local RAG Pipeline

PyPI
Python
License


🔎 What is DocRAG LLM?

DocRAG LLM is a local-first Retrieval-Augmented Generation (RAG) pipeline.
It connects Docling for parsing → ChromaDB for vector storage → Ollama for local LLM inference.

No cloud lock-in. No API costs. Just local docs → local vectors → local LLMs.


✨ Features

  • 🔍 Parse documents with Docling (PDF, DOCX, PPTX, HTML, etc.)
  • 📑 Intelligent chunking for retrieval
  • 🧠 Store embeddings in ChromaDB
  • 🤖 Answer questions using Ollama (default: llama3.2:1b)
  • 🛡️ Privacy-first → all local execution
  • 🖥️ Use as a CLI tool or Python library

📦 Installation

pip install docrag-llm

Requirements:

  • Python 3.10+
  • Ollama installed & running
  • Local models:
    ollama pull llama3.2:1b
    ollama pull nomic-embed-text
    

🚀 Quickstart

CLI – Ingest and Ask

# Ingest a document (default collection: demo)
python -m docrag.cli ingest https://arxiv.org/pdf/2508.20755

# Ask a question (default LLM: llama3.2:1b)
python -m docrag.cli ask "Summarize in 1 paragraph with 5 bullet points"

Python API

from docrag import DocragSettings, RAGPipeline

cfg = DocragSettings(
    persist_path="./.chroma",
    collection="demo",
    embed_model="nomic-embed-text",
    llm_model="llama3.2:1b",
)

pipeline = RAGPipeline(cfg)

# Ingest
n_chunks = pipeline.ingest("https://arxiv.org/pdf/2508.20755")
print(f"Ingested {n_chunks} chunks")

# Ask
answer = pipeline.ask("Give a concise bullet summary of the paper's contributions.")
print(answer)

⚙️ Configuration

Both CLI & Python API let you customize:

  • persist_path → where ChromaDB stores vectors
  • collection → logical collection name
  • embed_model → embedding model (Ollama tag)
  • llm_model → LLM model (default: llama3.2:1b)
  • chunk_chars / chunk_overlap → chunking granularity

📊 Roadmap

  • model-check CLI → list installed Ollama models
  • Support multiple backends (Weaviate, Milvus)
  • Streaming output for long answers
  • Expanded test suite (large document regression cases)
  • Example notebooks & Hugging Face demo

🤝 Contributing

PRs and issues welcome!

pip install "docrag-llm[dev]"
ruff check .
pytest

📜 License

MIT License © 2025 Armando Medina


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docrag_llm-0.1.26.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docrag_llm-0.1.26-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file docrag_llm-0.1.26.tar.gz.

File metadata

  • Download URL: docrag_llm-0.1.26.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for docrag_llm-0.1.26.tar.gz
Algorithm Hash digest
SHA256 869e7ab8e05b89c9e9d36b7c22faeeb9b79daf4009fba0b9735d56ff16091e45
MD5 1e43ca42c8d2557a855db0d2ee9bad3f
BLAKE2b-256 726a417143c1b37d3a52a77095c0785d0c9bc21760b886dd2f2d3825f277a4d4

See more details on using hashes here.

File details

Details for the file docrag_llm-0.1.26-py3-none-any.whl.

File metadata

  • Download URL: docrag_llm-0.1.26-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for docrag_llm-0.1.26-py3-none-any.whl
Algorithm Hash digest
SHA256 5670738658fd465199a96b48d3984f0b01cf7a66a767c98fa6eaafb8053e06cc
MD5 7e36c3dae0a1aa2748e44fb15571e254
BLAKE2b-256 7809ea4b130d972a2fb1e4dbdc1dc5032b45785f9ecc9fb618660b2cbaa8f9c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page