Skip to main content

Local PDF Q&A with RAG using Ollama & LangChain

Project description

zenpdf

PyPI Version Python Versions License PyPI Downloads

Local PDF Q&A with RAG using Ollama & LangChain

A peaceful CLI tool for chatting with your documents using local AI models. All processing happens on your machine - no cloud APIs, no data leaves your device.

Features

  • ๐Ÿ”’ Local-First - No cloud APIs, all processing on your machine
  • ๐Ÿ“„ Multi-Format - PDF, DOCX, and TXT support
  • โšก Streaming - Real-time AI responses as they're generated
  • ๐Ÿ’พ Persistent History - Chat history saved between sessions
  • ๐Ÿ“š Source Attribution - Know which documents informed each answer
  • ๐ŸŽจ Beautiful CLI - Rich terminal interface with colors and tables
  • โš™๏ธ Fully Configurable - Customize models, chunk sizes, and more

Installation

pip install zenpdf

Quick Start

# 1. Make sure Ollama is running
ollama

# 2. Pull required models (if not already done)
ollama pull llama3.2:1b
ollama pull nomic-embed-text

# 3. Index a document
zenpdf index ./my-document.pdf

# 4. Ask questions!
zenpdf ask "What is this document about?"

# 5. Interactive mode
zenpdf interactive

Commands

Document Operations

Command Description
zenpdf index <path> Index PDF/DOCX/TXT file or directory
zenpdf list List indexed documents
zenpdf remove <id> Remove document by ID
zenpdf clear Clear all documents

Query Operations

Command Description
zenpdf ask "question?" Ask a question
zenpdf ask "??" -k 6 Custom k chunks
zenpdf interactive Interactive Q&A mode

Reference & History

Command Description
zenpdf refs Show sources for last answer
zenpdf history Show chat history
zenpdf export <file> Export history (MD/JSON)

Configuration

Command Description
zenpdf config show Show all config
zenpdf config model <name> Set LLM model
zenpdf config embed <name> Set embedding model
zenpdf config chunk-size <n> Set chunk size
zenpdf config overlap <n> Set chunk overlap
zenpdf config k <n> Set default retrieved chunks
zenpdf config db-path <path> Set database path
zenpdf config history-size <n> Set max history size

Utilities

Command Description
zenpdf status Show database status
zenpdf reset Reset vector store
zenpdf --version Show version
zenpdf --help Show help

Configuration

Default settings (view with zenpdf config show):

Setting Default Description
model llama3.2:1b Ollama LLM model
embed_model nomic-embed-text Embedding model
chunk_size 1000 Text chunk size
chunk_overlap 100 Chunk overlap
k 4 Retrieved chunks
db_path ./zenpdf_db Vector database path
history_size 50 Max chat history
temperature 0.7 LLM temperature

Configuration is saved to .zenpdf_config.json in your working directory.

Requirements

  • Python 3.11+
  • Ollama installed and running
  • Ollama models:
    • llama3.2:1b (or your preferred model)
    • nomic-embed-text (for embeddings)

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Document   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   Splitter   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   ChromaDB  โ”‚
โ”‚   Loader    โ”‚     โ”‚   (Chunks)   โ”‚     โ”‚   (Vectors) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                 โ”‚
                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
                      โ”‚   Ollama     โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚  (Embeddings)โ”‚
                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                      โ”‚  RAG Chain   โ”‚
                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                      โ”‚      LLM     โ”‚
                      โ”‚   (Ollama)   โ”‚
                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Tech Stack

License

MIT License - see LICENSE

Contributing

Contributions welcome! Please open an issue or submit a PR.


Made with โค๏ธ for local AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zenpdf-0.1.1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zenpdf-0.1.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file zenpdf-0.1.1.tar.gz.

File metadata

  • Download URL: zenpdf-0.1.1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for zenpdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d0eef4334f31c38a5113c51501dbb05b2e031b9b4a3334304ab9da09ac7c1dcf
MD5 db75f888535a9694180e029521f3e47e
BLAKE2b-256 3dcb94c143d6b5fb2df95cca898fe9811b108921e687eb5f49d65bd99e67bd9b

See more details on using hashes here.

File details

Details for the file zenpdf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zenpdf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for zenpdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 43dc5dedafb2b55f8a080d2219bc8ff3d3e9fee654feb7e919ed08aecbac4bb7
MD5 d058128ab19e0012dc09c56c5a674aa8
BLAKE2b-256 027ab1bd8f3c6a7f884e4c4ec77f77e2a4928f96c7c3b1fb8f58b6a8173188d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page