Skip to main content

A local-first RAG pipeline CLI tool

Project description

🚀 DocPilot CLI

PyPI - Version Python Version License

DocPilot is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by Ollama and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.

It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.


✨ Features

  • 🔒 100% Local: No data ever leaves your machine. Powered by Ollama.
  • ⚡ Interactive Setup Wizard: Get up and running instantly with smart model auto-detection.
  • 🌐 Universal Ingestion: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
  • 🚀 Concurrent Processing: Lightning-fast crawling and multi-threaded document chunking.
  • 🎛️ Performance Profiles: Switch between fast, balanced, and quality inference speeds on the fly.
  • 🎨 Beautiful Terminal UI: Rich markdown rendering, ASCII art, and intuitive progress bars.

📦 Installation

DocPilot is available on PyPI! You can install it globally using pip, uv, or pipx.

# Recommended: Install using uv or pipx
uv tool install docpilot-cli

# Or via standard pip
pip install docpilot-cli

# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"

Prerequisites

  1. Python 3.12+
  2. Ollama: Installed and running in the background.
  3. Pull your preferred models:
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m

🛠️ Quick Start

The very first time you run a DocPilot command, it will launch the Interactive Setup Wizard to help you configure your chat and embedding models.

1. Ingest Knowledge

Point DocPilot to any documentation site, sitemap, PDF, or CSV:

# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16

# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"

# Ingest a CSV
docpilot ingest "./data/faq.csv"

2. Ask Questions

Query your newly created local knowledge base:

docpilot ask "How do I create a virtual environment?"

🧰 CLI Command Reference

Manage your configuration, models, and local database with ease.

docpilot setup

Re-run the interactive setup wizard at any time to change your default models.

docpilot clear

Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.

docpilot speed [profile]

Adjust the retrieval and generation settings for your desired use case.

  • fast: Lower latency, shorter context limits.
  • balanced: Default trade-off.
  • quality: Larger context, more comprehensive answers, slower inference.

docpilot model

Manually manage your Ollama models without the interactive setup wizard.

docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>

docpilot render

Parse and beautifully render any markdown file or text string directly in your terminal.

docpilot show

Display your current project version and configuration in beautiful ASCII art.


🏗️ Architecture Under the Hood

DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:

  1. Ingestion: Native Python extractors (BeautifulSoup4, csv, pypdf) parse the raw data.
  2. Chunking: Multi-threaded chunkers slice the documents into semantically coherent pieces.
  3. Embedding: langchain-ollama creates local vector embeddings via Ollama.
  4. Storage: chromadb persistently stores vectors on disk at ~/.docpilot/chroma_langchain_db.
  5. Retrieval: User queries are embedded, matched via similarity search, and fed into a system prompt.
  6. Generation: The designated Ollama chat model generates the final response streamed to the terminal using rich.

🤝 Contributing

Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.

To set up a local development environment:

git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docpilot_cli-1.0.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docpilot_cli-1.0.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file docpilot_cli-1.0.0.tar.gz.

File metadata

  • Download URL: docpilot_cli-1.0.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ec452bfe664e85fede7520c57649d915d0d934ae078b4b2a7027a7e81d8e9acf
MD5 007de2344374094da1543f41992b2c10
BLAKE2b-256 711bdca9473f0ed2922a4725d474009de6df94ad9dca1925994c483edc218934

See more details on using hashes here.

File details

Details for the file docpilot_cli-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: docpilot_cli-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ae90eedaba704a64cc6e6244cc9bd0a4bb3fbe67228649ea90a4707c895025c5
MD5 2d393a068c48c451df2892204a159261
BLAKE2b-256 56320494091b1cc247c9c4aeba930694634aa9728dfc7d8e40c494e917589fbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page