Docling → Chroma → Ollama: simple, RAG pipeline

These details have not been verified by PyPI

Project description

📄 DocRAG LLM

DocRAG LLM is a simple, enterprise-ready Retrieval-Augmented Generation (RAG) pipeline.
It connects Docling (document parsing) → ChromaDB (vector store) → Ollama (local LLMs) into a single workflow, with both a CLI and a Python API.

✨ Features

🔍 Parse documents with Docling (PDF, DOCX, PPTX, HTML, etc.).
📑 Chunk text intelligently for retrieval.
🧠 Store embeddings in ChromaDB.
🤖 Answer questions using Ollama (default: llama3.2).
🛡️ Designed for local execution (no cloud lock-in).
🖥️ Works both as a CLI tool and a Python library.

📦 Installation

pip install docrag-llm

Requirements

Python 3.10+
Ollama installed and running

Models pulled locally:

ollama pull llama3.2
ollama pull nomic-embed-text

🚀 Quickstart (CLI)

Ingest a document into Chroma

python -m docrag.cli ingest https://arxiv.org/pdf/2408.09869   --persist ./.chroma   --collection demo

--persist → directory for Chroma DB (default: ./.chroma)
--collection → logical collection name (default: demo)
--embed → embedding model (default: nomic-embed-text)

Ask a question (default LLM = `llama3.2`)

python -m docrag.cli ask "Give a concise bullet summary of the paper's main contributions."   --persist ./.chroma   --collection demo

--llm → LLM model to use (default: llama3.2)
--top-k → number of chunks retrieved (default: 5)

Export parsed text

python -m docrag.cli export https://arxiv.org/pdf/2408.09869   --out-dir ./exports

Saves parsed text (Markdown/JSON).

CLI Help

python -m docrag.cli --help
python -m docrag.cli ingest --help
python -m docrag.cli ask --help
python -m docrag.cli export --help

🐍 Usage as a Python Library

from docrag import DocragSettings, RAGPipeline

# Configure pipeline
cfg = DocragSettings(
    persist_path="./.chroma",
    collection="demo",
    embed_model="nomic-embed-text",
    llm_model="llama3.2",
)

pipeline = RAGPipeline(cfg)

# Ingest a document
n_chunks = pipeline.ingest("https://arxiv.org/pdf/2408.09869")
print(f"Ingested {n_chunks} chunks")

# Ask a question
answer = pipeline.ask("Give a concise bullet summary of the paper's main contributions.")
print(answer)

⚙️ Configuration

Both the Python API and CLI allow controlling:

persist_path → path to Chroma DB
collection → collection name
embed_model → embedding model (Ollama tag)
llm_model → LLM model (default: llama3.2)
chunk_chars / chunk_overlap → chunking granularity

📊 Roadmap

Add model-check CLI command to list installed Ollama models.
Support multiple backends (Weaviate, Milvus).
Add streaming output for long answers.
Expand test suite with large document regression cases.

🤝 Contributing

PRs and issues welcome! Please run lint and tests before submitting:

ruff check .
pytest

📜 License

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

0.1.27

Sep 2, 2025

0.1.26

Sep 2, 2025

0.1.25

Aug 31, 2025

0.1.24

Aug 31, 2025

0.1.23

Aug 31, 2025

0.1.22

Aug 31, 2025

0.1.21

Aug 31, 2025

0.1.20

Aug 31, 2025

0.1.19

Aug 31, 2025

0.1.18

Aug 31, 2025

0.1.17

Aug 30, 2025

0.1.16

Aug 30, 2025

0.1.15

Aug 30, 2025

0.1.14

Aug 30, 2025

0.1.13

Aug 30, 2025

0.1.12

Aug 30, 2025

This version

0.1.11

Aug 30, 2025

0.1.10

Aug 30, 2025

0.1.9

Aug 30, 2025

0.1.5

Aug 30, 2025

0.1.3

Aug 30, 2025

0.1.2

Aug 30, 2025

0.1.1

Aug 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docrag_llm-0.1.11.tar.gz (314.7 kB view details)

Uploaded Aug 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docrag_llm-0.1.11-py3-none-any.whl (7.6 kB view details)

Uploaded Aug 30, 2025 Python 3

File details

Details for the file docrag_llm-0.1.11.tar.gz.

File metadata

Download URL: docrag_llm-0.1.11.tar.gz
Upload date: Aug 30, 2025
Size: 314.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docrag_llm-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`d064aeb0c31f484c17607676422ee42ba29491ea2393464ed6bb258091e46113`
MD5	`02edf958844d50ce4d3456664f7bf0c0`
BLAKE2b-256	`5f244a528aee0932b26e4482621aca510c4f7c35323aba0f68530da16db1adad`

See more details on using hashes here.

File details

Details for the file docrag_llm-0.1.11-py3-none-any.whl.

File metadata

Download URL: docrag_llm-0.1.11-py3-none-any.whl
Upload date: Aug 30, 2025
Size: 7.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docrag_llm-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e7023692257626e17f11005917ae28840e85f8c7ac5209ea35a82af3081391b3`
MD5	`a5e9d9d060e73976e4ba73b213be881b`
BLAKE2b-256	`fc9ff8835691d7b5f2547389700b4fbc0040f34fa2cccad2755a156b070f5184`

See more details on using hashes here.

docrag-llm 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

📄 DocRAG LLM

✨ Features

📦 Installation

Requirements

🚀 Quickstart (CLI)

Ingest a document into Chroma

Ask a question (default LLM = `llama3.2`)

Export parsed text

CLI Help

🐍 Usage as a Python Library

⚙️ Configuration

📊 Roadmap

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

docrag-llm 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

📄 DocRAG LLM

✨ Features

📦 Installation

Requirements

🚀 Quickstart (CLI)

Ingest a document into Chroma

Ask a question (default LLM = llama3.2)

Export parsed text

CLI Help

🐍 Usage as a Python Library

⚙️ Configuration

📊 Roadmap

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Ask a question (default LLM = `llama3.2`)