Skip to main content

A local-first RAG pipeline CLI tool

Project description

🚀 DocPilot CLI

PyPI - Version Python Version License

DocPilot is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by Ollama and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.

It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.


✨ Features

  • 🔒 100% Local: No data ever leaves your machine. Powered by Ollama.
  • ⚡ Interactive Setup Wizard: Get up and running instantly with smart model auto-detection.
  • 🌐 Universal Ingestion: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
  • 🚀 Concurrent Processing: Lightning-fast crawling and multi-threaded document chunking.
  • 🎛️ Performance Profiles: Switch between fast, balanced, and quality inference speeds on the fly.
  • 🎨 Beautiful Terminal UI: Rich markdown rendering, ASCII art, and intuitive progress bars.

📦 Installation

DocPilot is available on PyPI! You can install it globally using pip, uv, or pipx.

# Recommended: Install using uv or pipx
uv tool install docpilot-cli

# Or via standard pip
pip install docpilot-cli

# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"

Prerequisites

  1. Python 3.12+
  2. Ollama: Installed and running in the background.
  3. Pull your preferred models:
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m

🛠️ Quick Start

The very first time you run a DocPilot command, it will launch the Interactive Setup Wizard to help you configure your chat and embedding models.

1. Ingest Knowledge

Point DocPilot to any documentation site, sitemap, PDF, or CSV:

# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16

# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"

# Ingest a CSV
docpilot ingest "./data/faq.csv"

[!TIP] If you installed via standard pip and get a "command not found" error because your binary path isn't configured, you can always run docpilot by prefixing commands with python -m docpilot (e.g., python -m docpilot ingest ...).

2. Ask Questions

Query your newly created local knowledge base:

docpilot ask "How do I create a virtual environment?"

🧰 CLI Command Reference

Manage your configuration, models, and local database with ease.

docpilot setup

Re-run the interactive setup wizard at any time to change your default models.

docpilot project

Switch to a different project or check the active project. Each project has its own isolated vector database.

docpilot project
docpilot project <project-name>

docpilot clear

Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.

docpilot speed [profile]

Adjust the retrieval and generation settings for your desired use case.

  • fast: Lower latency, shorter context limits.
  • balanced: Default trade-off.
  • quality: Larger context, more comprehensive answers, slower inference.

docpilot model

Manually manage your Ollama models without the interactive setup wizard.

docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>

docpilot render

Parse and beautifully render any markdown file or text string directly in your terminal.

docpilot show

Display your current project version and configuration in beautiful ASCII art.


🏗️ Architecture Under the Hood

DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:

  1. Ingestion: Native Python extractors (BeautifulSoup4, csv, pypdf) parse the raw data.
  2. Chunking: Multi-threaded chunkers slice the documents into semantically coherent pieces.
  3. Embedding: langchain-ollama creates local vector embeddings via Ollama.
  4. Storage: chromadb persistently stores vectors on disk at ~/.docpilot/chroma_langchain_db.
  5. Retrieval: User queries are embedded, matched via similarity search, and fed into a system prompt.
  6. Generation: The designated Ollama chat model generates the final response streamed to the terminal using rich.

🤝 Contributing

Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.

To set up a local development environment:

git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docpilot_cli-1.0.4.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docpilot_cli-1.0.4-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file docpilot_cli-1.0.4.tar.gz.

File metadata

  • Download URL: docpilot_cli-1.0.4.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.4.tar.gz
Algorithm Hash digest
SHA256 daa106ee55c0eea221d491eed34a6f724cfa7fc6580d48ef766297b4fadb8206
MD5 ae3c3e13d098a10038c6a00be1b6cf47
BLAKE2b-256 f50a4448175c785bd2bde60ce41acabcbc5b29637273d6560e7cc62f353e8afc

See more details on using hashes here.

File details

Details for the file docpilot_cli-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: docpilot_cli-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for docpilot_cli-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cb2c623035f1601e7776a517ed3b3e597a25707c69371e08d066342bcfceec34
MD5 6a487d7d4d90043db2e6bb4f265194b6
BLAKE2b-256 87950648cc56841d294f5064f1d515328afd31c4bb0a868f32ad7158ec6c49c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page