Skip to main content

A local-first RAG pipeline CLI tool

Project description

🚀 DocPilot CLI

PyPI - Version Python Version License

DocPilot is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by Ollama and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.

It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.


✨ Features

  • 🔒 100% Local: No data ever leaves your machine. Powered by Ollama.
  • ⚡ Interactive Setup Wizard: Get up and running instantly with smart model auto-detection.
  • 🌐 Universal Ingestion: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
  • 🚀 Concurrent Processing: Lightning-fast crawling and multi-threaded document chunking.
  • 🎛️ Performance Profiles: Switch between fast, balanced, and quality inference speeds on the fly.
  • 🎨 Beautiful Terminal UI: Rich markdown rendering, ASCII art, and intuitive progress bars.

📦 Installation

DocPilot is available on PyPI! You can install it globally using pip, uv, or pipx.

# Recommended: Install using uv or pipx
uv tool install docpilot-cli

# Or via standard pip
pip install docpilot-cli

# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"

Prerequisites

  1. Python 3.12+
  2. Ollama: Installed and running in the background.
  3. Pull your preferred models:
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m

🛠️ Quick Start

The very first time you run a DocPilot command, it will launch the Interactive Setup Wizard to help you configure your chat and embedding models.

1. Ingest Knowledge

Point DocPilot to any documentation site, sitemap, PDF, or CSV:

# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16

# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"

# Ingest a CSV
docpilot ingest "./data/faq.csv"

[!TIP] If you installed via standard pip and get a "command not found" error because your binary path isn't configured, you can always run docpilot by prefixing commands with python -m docpilot (e.g., python -m docpilot ingest ...).

2. Ask Questions

Query your newly created local knowledge base:

docpilot ask "How do I create a virtual environment?"

🧰 CLI Command Reference

Manage your configuration, models, and local database with ease.

docpilot setup

Re-run the interactive setup wizard at any time to change your default models.

docpilot project

Switch to a different project or check the active project. Each project has its own isolated vector database.

docpilot project
docpilot project <project-name>

docpilot clear

Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.

docpilot speed [profile]

Adjust the retrieval and generation settings for your desired use case.

  • fast: Lower latency, shorter context limits.
  • balanced: Default trade-off.
  • quality: Larger context, more comprehensive answers, slower inference.

docpilot model

Manually manage your Ollama models without the interactive setup wizard.

docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>

docpilot render

Parse and beautifully render any markdown file or text string directly in your terminal.

docpilot show

Display your current project version and configuration in beautiful ASCII art.


🏗️ Architecture Under the Hood

DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:

  1. Ingestion: Native Python extractors (BeautifulSoup4, csv, pypdf) parse the raw data.
  2. Chunking: Multi-threaded chunkers slice the documents into semantically coherent pieces.
  3. Embedding: langchain-ollama creates local vector embeddings via Ollama.
  4. Storage: chromadb persistently stores vectors on disk at ~/.docpilot/chroma_langchain_db.
  5. Retrieval: User queries are embedded, matched via similarity search, and fed into a system prompt.
  6. Generation: The designated Ollama chat model generates the final response streamed to the terminal using rich.

🤝 Contributing

Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.

To set up a local development environment:

git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docpilot_cli-1.0.2.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docpilot_cli-1.0.2-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file docpilot_cli-1.0.2.tar.gz.

File metadata

  • Download URL: docpilot_cli-1.0.2.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.2.tar.gz
Algorithm Hash digest
SHA256 3d2299735f97c4c60ce087d8af336126b3d19eab9c3b066c591e6103e9b74fba
MD5 ae46c1ae70b7c4476f81cb64830198db
BLAKE2b-256 3b960f1fa462b369ff374a32285f3e59eca4eb0efc632498041a335a6a4de092

See more details on using hashes here.

File details

Details for the file docpilot_cli-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: docpilot_cli-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a743ca7b520d1868ba3d379e27ae02605f48fe19daf023765f17c10c3c83845d
MD5 bef504cab8a69d5a70ecc20c7a660105
BLAKE2b-256 ecaedf5a26b008e033aafb6bd87c98572dc947de653186ba4f0e256ed098e4c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page