A local-first RAG pipeline CLI tool
Project description
🚀 DocPilot CLI
DocPilot is a lightning-fast, local-first CLI for document ingestion and interactive question-answering. Powered by Ollama and Chroma, it allows you to ingest websites, PDFs, and CSVs directly from your terminal and chat with your documents—keeping 100% of your data safely on your own machine.
It’s built for practical developer workflows: crawl sites concurrently, prepare chunks with multi-threading, and iterate rapidly without ever paying for a cloud API.
✨ Features
- 🔒 100% Local: No data ever leaves your machine. Powered by Ollama.
- ⚡ Interactive Setup Wizard: Get up and running instantly with smart model auto-detection.
- 🌐 Universal Ingestion: Seamlessly ingest Website URLs, XML Sitemaps, PDFs, and CSVs.
- 🚀 Concurrent Processing: Lightning-fast crawling and multi-threaded document chunking.
- 🎛️ Performance Profiles: Switch between
fast,balanced, andqualityinference speeds on the fly. - 🎨 Beautiful Terminal UI: Rich markdown rendering, ASCII art, and intuitive progress bars.
📦 Installation
DocPilot is available on PyPI! You can install it globally using pip, uv, or pipx.
# Recommended: Install using uv or pipx
uv tool install docpilot-cli
# Or via standard pip
pip install docpilot-cli
# Optional: Add PDF parsing support
pip install "docpilot-cli[pdf]"
Prerequisites
- Python 3.12+
- Ollama: Installed and running in the background.
- Pull your preferred models:
ollama pull qwen2.5:latest
ollama pull mxbai-embed-large:335m
🛠️ Quick Start
The very first time you run a DocPilot command, it will launch the Interactive Setup Wizard to help you configure your chat and embedding models.
1. Ingest Knowledge
Point DocPilot to any documentation site, sitemap, PDF, or CSV:
# Crawl a website
docpilot ingest "https://docs.python.org/3/" --max-pages 100 --workers 16
# Ingest a local PDF
docpilot ingest "./docs/engineering_handbook.pdf"
# Ingest a CSV
docpilot ingest "./data/faq.csv"
[!TIP] If you installed via standard
pipand get a "command not found" error because your binary path isn't configured, you can always run docpilot by prefixing commands withpython -m docpilot(e.g.,python -m docpilot ingest ...).
2. Ask Questions
Query your newly created local knowledge base:
docpilot ask "How do I create a virtual environment?"
🧰 CLI Command Reference
Manage your configuration, models, and local database with ease.
docpilot setup
Re-run the interactive setup wizard at any time to change your default models.
docpilot project
Switch to a different project or check the active project. Each project has its own isolated vector database.
docpilot project
docpilot project <project-name>
docpilot clear
Wipe your local Chroma vector database to start fresh. Prompts for safety confirmation.
docpilot speed [profile]
Adjust the retrieval and generation settings for your desired use case.
fast: Lower latency, shorter context limits.balanced: Default trade-off.quality: Larger context, more comprehensive answers, slower inference.
docpilot model
Manually manage your Ollama models without the interactive setup wizard.
docpilot model list
docpilot model set <chat-model>
docpilot model setembed <embedding-model>
docpilot render
Parse and beautifully render any markdown file or text string directly in your terminal.
docpilot show
Display your current project version and configuration in beautiful ASCII art.
🏗️ Architecture Under the Hood
DocPilot employs an optimized RAG (Retrieval-Augmented Generation) pipeline:
- Ingestion: Native Python extractors (BeautifulSoup4,
csv,pypdf) parse the raw data. - Chunking: Multi-threaded chunkers slice the documents into semantically coherent pieces.
- Embedding:
langchain-ollamacreates local vector embeddings via Ollama. - Storage:
chromadbpersistently stores vectors on disk at~/.docpilot/chroma_langchain_db. - Retrieval: User queries are embedded, matched via similarity search, and fed into a system prompt.
- Generation: The designated Ollama chat model generates the final response streamed to the terminal using
rich.
🤝 Contributing
Contributions are welcome! If you are using DocPilot for your daily workflows or in a hackathon, feel free to open issues and pull requests.
To set up a local development environment:
git clone https://github.com/yourusername/docpilot.git
cd docpilot
uv pip install -e ".[dev]"
uv run pytest
📄 License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docpilot_cli-1.0.4.tar.gz.
File metadata
- Download URL: docpilot_cli-1.0.4.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daa106ee55c0eea221d491eed34a6f724cfa7fc6580d48ef766297b4fadb8206
|
|
| MD5 |
ae3c3e13d098a10038c6a00be1b6cf47
|
|
| BLAKE2b-256 |
f50a4448175c785bd2bde60ce41acabcbc5b29637273d6560e7cc62f353e8afc
|
File details
Details for the file docpilot_cli-1.0.4-py3-none-any.whl.
File metadata
- Download URL: docpilot_cli-1.0.4-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb2c623035f1601e7776a517ed3b3e597a25707c69371e08d066342bcfceec34
|
|
| MD5 |
6a487d7d4d90043db2e6bb4f265194b6
|
|
| BLAKE2b-256 |
87950648cc56841d294f5064f1d515328afd31c4bb0a868f32ad7158ec6c49c3
|