Skip to main content

A local-first RAG pipeline CLI tool

Project description

🚀 DocPilot CLI

PyPI - Version Python Version License

DocPilot is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by Ollama and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.

It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.


✨ Features

  • 🔒 100% Local: No data ever leaves your machine. Powered by Ollama.
  • ⚡ Interactive Setup Wizard: Get up and running instantly with smart model auto-detection.
  • 🌐 Universal Ingestion: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
  • 🚀 Concurrent Processing: Lightning-fast crawling and multi-threaded document chunking.
  • 🎛️ Performance Profiles: Switch between fast, balanced, and quality inference speeds on the fly.
  • 🎨 Beautiful Terminal UI: Rich markdown rendering, ASCII art, and intuitive progress bars.

📦 Installation

DocPilot is available on PyPI! You can install it globally using pip, uv, or pipx.

# Recommended: Install using uv or pipx
uv tool install docpilot-cli

# Or via standard pip
pip install docpilot-cli

# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"

Prerequisites

  1. Python 3.12+
  2. Ollama: Installed and running in the background.
  3. Pull your preferred models:
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m

🛠️ Quick Start

The very first time you run a DocPilot command, it will launch the Interactive Setup Wizard to help you configure your chat and embedding models.

1. Ingest Knowledge

Point DocPilot to any documentation site, sitemap, PDF, or CSV:

# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16

# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"

# Ingest a CSV
docpilot ingest "./data/faq.csv"

[!TIP] If you installed via standard pip and get a "command not found" error because your binary path isn't configured, you can always run docpilot by prefixing commands with python -m docpilot (e.g., python -m docpilot ingest ...).

2. Ask Questions

Query your newly created local knowledge base:

docpilot ask "How do I create a virtual environment?"

🧰 CLI Command Reference

Manage your configuration, models, and local database with ease.

docpilot setup

Re-run the interactive setup wizard at any time to change your default models.

docpilot project

Switch to a different project or check the active project. Each project has its own isolated vector database.

docpilot project
docpilot project <project-name>

docpilot clear

Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.

docpilot speed [profile]

Adjust the retrieval and generation settings for your desired use case.

  • fast: Lower latency, shorter context limits.
  • balanced: Default trade-off.
  • quality: Larger context, more comprehensive answers, slower inference.

docpilot model

Manually manage your Ollama models without the interactive setup wizard.

docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>

docpilot render

Parse and beautifully render any markdown file or text string directly in your terminal.

docpilot show

Display your current project version and configuration in beautiful ASCII art.


🏗️ Architecture Under the Hood

DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:

  1. Ingestion: Native Python extractors (BeautifulSoup4, csv, pypdf) parse the raw data.
  2. Chunking: Multi-threaded chunkers slice the documents into semantically coherent pieces.
  3. Embedding: langchain-ollama creates local vector embeddings via Ollama.
  4. Storage: chromadb persistently stores vectors on disk at ~/.docpilot/chroma_langchain_db.
  5. Retrieval: User queries are embedded, matched via similarity search, and fed into a system prompt.
  6. Generation: The designated Ollama chat model generates the final response streamed to the terminal using rich.

🤝 Contributing

Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.

To set up a local development environment:

git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docpilot_cli-1.0.3.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docpilot_cli-1.0.3-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file docpilot_cli-1.0.3.tar.gz.

File metadata

  • Download URL: docpilot_cli-1.0.3.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.3.tar.gz
Algorithm Hash digest
SHA256 0286de5b780f438d2417aa8e473592962c4dde34dc81870574976bfc5be7ffa1
MD5 0f3689c1589b344c4311ab9e268a8f4d
BLAKE2b-256 36ccf5dbfd2352fccebb8c53cb380d3e6e42f208a8a8a8d8b650af67884bdde4

See more details on using hashes here.

File details

Details for the file docpilot_cli-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: docpilot_cli-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fb53658e4cd430d76797ba37e0c4190d4edbc8e8b2bafaf3b7ffaeae64fc9009
MD5 3c4776e8f8728ffab45a8688e35886b2
BLAKE2b-256 d814058459eac6c1de9e1542d18511040c64b86cfb62cd1ea125c01b56692fe7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page