Privacy-first, local-only CLI tool that transforms document collections into an intelligent Q&A system
Project description
AskYourDocs ๐๐
AskYourDocs is a privacy-first, local-only CLI tool that transforms your document collections into an intelligent Q&A system. Using advanced RAG (Retrieval Augmented Generation) technology, it allows you to ask natural language questions about your documents and get accurate, contextual answers with source citations.
โจ Key Features
- ๐ 100% Privacy: All processing happens locally, your documents never leave your machine
- ๐ง Intelligent Q&A: Ask natural language questions and get contextual answers
- ๐ Multi-Format Support: PDF, Word, PowerPoint, Markdown, code files, and more
- โก Fast Retrieval: Hybrid search combining semantic and keyword matching
- ๐ฏ Source Attribution: Every answer includes citations to source documents
- ๐ Incremental Updates: Only processes changed files for efficiency
- ๐จ Beautiful CLI: Rich terminal output with progress bars and colors
- โ๏ธ Highly Configurable: YAML-based configuration for all settings
๐ Quick Start
Installation
Option 1: Install from PyPI (Recommended)
# Basic installation (local models only)
pip install askyourdocs
# With remote LLM support
pip install askyourdocs[remote]
# With GPU acceleration
pip install askyourdocs[gpu]
# Full installation with all features
pip install askyourdocs[all]
Option 2: Install with Poetry (Development)
# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Install with all extras for development
poetry install --extras "all"
# Run a basic command
poetry run askyourdocs --help
Option 3: Install from Source (Advanced)
# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Or install with optional dependencies
pip install -e ".[gpu,remote,dev]"
Setup Prerequisites
For Local Processing (Recommended)
-
Install Ollama (for local LLM inference):
# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Windows (WSL) curl -fsSL https://ollama.ai/install.sh | sh
-
Start Ollama and download the default model:
# Start Ollama service ollama serve # In another terminal, download the default lightweight model ollama pull tinyllama:1.1b # Or download a more capable model (larger download) ollama pull llama3.1:8b
For Remote Processing (Optional)
If you prefer to use remote LLM providers, you'll need API keys:
OpenAI Setup:
# Install with OpenAI support
pip install askyourdocs[openai]
# Set your API key
export OPENAI_API_KEY="your-api-key-here"
# Configure for OpenAI
askyourdocs config setup --provider openai
Anthropic Setup:
# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Get your API key from https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY="your-api-key-here"
# 3. Configure for Anthropic (recommended)
askyourdocs config setup --provider anthropic
Azure OpenAI Setup:
# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Set your credentials
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
# 3. Configure for Azure (recommended)
askyourdocs config setup --provider azure
Basic Usage
-
Index your documents:
# Index documents in current directory askyourdocs ingest # Index specific directory askyourdocs ingest ./my-documents # Index with progress and verbose output askyourdocs ingest ./docs --verbose
-
Ask questions:
# Ask a question askyourdocs ask "What are the main conclusions in the research papers?" # Ask with specific number of sources askyourdocs ask "How does the API authentication work?" --top-k 5 # Get detailed response with full sources askyourdocs ask "Summarize the project requirements" --verbose
-
Interactive mode:
# Start interactive session askyourdocs interactive # In interactive mode: > What is the project timeline? > Can you explain the technical architecture? > exit
-
Check system status:
# View system status and configuration askyourdocs status # Validate configuration askyourdocs config validate
-
Configuration management:
# Interactive setup askyourdocs config setup # View configuration askyourdocs config show # Set specific values askyourdocs config set model.temperature 0.2 askyourdocs config set retrieval.top_k 10
๐ Command Reference
Core Commands
ingest - Index Documents
askyourdocs ingest [PATH] [OPTIONS]
# Examples:
askyourdocs ingest # Current directory
askyourdocs ingest ./documents # Specific path
askyourdocs ingest --include "*.pdf,*.md" # Filter file types
askyourdocs ingest --exclude "temp/*" # Exclude patterns
askyourdocs ingest --force # Rebuild entire index
askyourdocs ingest --watch # Watch for changes
Options:
--include TEXT: File patterns to include (e.g., ".pdf,.docx")--exclude TEXT: File patterns to exclude (e.g., "temp/,.log")--force: Force rebuild of entire index--watch: Watch directory for changes and auto-update--chunk-size INTEGER: Override chunk size for processing--verbose: Show detailed processing information
ask - Query Documents
askyourdocs ask "your question" [OPTIONS]
# Examples:
askyourdocs ask "What is the main thesis?"
askyourdocs ask "How do I configure the database?" --top-k 5
askyourdocs ask "Summarize key findings" --mode compact
askyourdocs ask "What are the requirements?" --stream
Options:
--top-k INTEGER: Number of relevant chunks to retrieve (default: 5)--mode TEXT: Response mode (compact/tree_summarize/accumulate)--stream: Stream response as it's generated--no-sources: Don't show source citations--threshold FLOAT: Similarity threshold for retrieval (0.0-1.0)
search - Fast Keyword Search
askyourdocs search "keyword" [OPTIONS]
# Examples:
askyourdocs search "authentication"
askyourdocs search "machine learning" --limit 10
askyourdocs search "API" --format json
refresh - Rebuild Index
askyourdocs refresh [OPTIONS]
# Examples:
askyourdocs refresh # Rebuild current index
askyourdocs refresh --reset # Delete and rebuild from scratch
askyourdocs refresh --optimize # Optimize vector store
status - System Information
askyourdocs status
# Example output:
๐ AskYourDocs Status
โโโ ๐ Documents: 1,247 files indexed
โโโ ๐งฉ Chunks: 5,834 text chunks
โโโ ๐พ Storage: 156.7 MB vector data
โโโ ๐ง Model: llama3.1:8b (Ollama)
โโโ ๐ Embeddings: BAAI/bge-small-en-v1.5
โโโ โ๏ธ Config: ~/.config/askyourdocs/config.yaml
Configuration Commands
config - Manage Configuration
askyourdocs config [COMMAND] [OPTIONS]
# View current configuration
askyourdocs config show
askyourdocs config show --format yaml
askyourdocs config show --section model
# Set configuration values
askyourdocs config set model.name llama3.1:8b
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"
# Interactive setup
askyourdocs config setup
askyourdocs config setup --provider openai
# Validate configuration
askyourdocs config validate
# Reset to defaults
askyourdocs config reset
# Show configuration file location
askyourdocs config path
Advanced Commands
interactive - Interactive Mode
askyourdocs interactive [OPTIONS]
# Start interactive session with custom settings
askyourdocs interactive --top-k 3 --stream
export - Backup Data
askyourdocs export --output backup.tar.gz
askyourdocs export --output backup.tar.gz --include-config
import - Restore Data
askyourdocs import --input backup.tar.gz
askyourdocs import --input backup.tar.gz --merge
๐ ๏ธ Configuration
AskYourDocs uses a YAML configuration file located at ~/.config/askyourdocs/config.yaml. You can customize all aspects of the tool:
Local Models (Default - No API Key Required)
model:
provider: "ollama" # Local Ollama server
name: "tinyllama:1.1b" # Lightweight model (fast, good for most tasks)
base_url: "http://localhost:11434"
temperature: 0.1 # Response creativity (0.0-2.0)
max_tokens: 2048 # Maximum response length
embedding:
provider: "huggingface" # Local embeddings
model: "BAAI/bge-small-en-v1.5" # Fast, accurate embeddings
device: "cpu" # cpu/cuda/mps/auto
Setup Command: askyourdocs config setup --provider ollama
Remote Models (API Key Required)
OpenAI Configuration:
model:
provider: "openai"
name: "gpt-4" # or gpt-3.5-turbo
api_key: "sk-your-key-here" # Or set OPENAI_API_KEY env var
temperature: 0.1
max_tokens: 2048
embedding:
provider: "openai" # Optional: use OpenAI embeddings
model: "text-embedding-3-small"
api_key: "sk-your-key-here"
Setup Command: askyourdocs config setup --provider openai
Anthropic Configuration:
model:
provider: "anthropic"
name: "claude-3-5-sonnet-20241022" # Latest Claude model
api_key: "sk-ant-your-key-here" # Or set ANTHROPIC_API_KEY env var
temperature: 0.1
max_tokens: 2048
embedding:
provider: "huggingface" # Keep local embeddings for privacy
model: "BAAI/bge-small-en-v1.5"
Setup Command: askyourdocs config setup --provider anthropic
Azure OpenAI Configuration:
model:
provider: "azure"
name: "gpt-4"
api_key: "your-azure-key"
azure_endpoint: "https://your-resource.openai.azure.com/"
azure_deployment: "your-deployment-name"
Setup Command: askyourdocs config setup --provider azure
Advanced Configuration
Document Processing:
chunking:
strategy: "sentence" # sentence/recursive/semantic/fixed
chunk_size: 1000 # Characters per chunk (100-8000)
chunk_overlap: 200 # Overlap between chunks
respect_boundaries: true # Respect sentence/paragraph boundaries
min_chunk_size: 100 # Minimum chunk size
Retrieval Settings:
retrieval:
top_k: 5 # Number of chunks to retrieve (1-50)
similarity_threshold: 0.7 # Minimum similarity score (0.0-1.0)
rerank: true # Re-rank results for better relevance
retrieval_mode: "hybrid" # vector/keyword/hybrid
max_context_length: 4000 # Maximum context for LLM
Storage Settings:
storage:
backend: "chromadb" # Vector database backend
path: ".askyourdocs" # Storage directory
compression: true # Enable compression
collection_name: "documents" # Collection name
๐ฏ Examples
Quick Start with Local Models
# 1. Install and setup
pip install askyourdocs
ollama serve # In one terminal
ollama pull tinyllama:1.1b # In another terminal
# 2. Index your documents
askyourdocs ingest ./my-documents
# 3. Ask questions
askyourdocs ask "What are the key findings?"
Using with OpenAI
# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Set up OpenAI API key
export OPENAI_API_KEY="your-api-key"
# 3. Configure for OpenAI
askyourdocs config setup --provider openai
# 4. Index and query documents
askyourdocs ingest ./documents
askyourdocs ask "What are the key findings in these documents?"
# 5. Verify setup
askyourdocs status
Research Papers Analysis
# Index your research papers
askyourdocs ingest ./research-papers --include "*.pdf"
# Ask analytical questions
askyourdocs ask "What are the common methodologies across these studies?"
askyourdocs ask "Which papers mention transformer architecture?"
askyourdocs ask "Summarize the key findings about neural networks"
Code Documentation
# Index your codebase documentation
askyourdocs ingest ./docs --include "*.md,*.rst"
# Query your docs
askyourdocs ask "How do I set up authentication?"
askyourdocs ask "What are the API rate limits?"
askyourdocs ask "Show me examples of database configuration"
Legal Documents
# Index contracts and legal docs
askyourdocs ingest ./legal --include "*.pdf,*.docx"
# Ask specific questions
askyourdocs ask "What are the termination clauses?"
askyourdocs ask "What payment terms are specified?"
askyourdocs ask "Are there any liability limitations?"
# Query specific contract types
askyourdocs ask "What are the key terms?" --path ./employment-contracts
askyourdocs ask "What are the renewal conditions in ./service-agreements?"
Path-Specific Querying
AskYourDocs supports querying specific paths, with automatic ingestion if needed:
# Method 1: Using --path option
askyourdocs ask "What are the main topics?" --path ./research-papers
# Method 2: Include path in question
askyourdocs ask "What are the key findings in ./data-analysis?"
# Auto-ingestion: If path isn't indexed, it will be ingested automatically
askyourdocs ask "Summarize the content" --path ./new-documents
# Multiple path queries
askyourdocs ask "Compare findings in ./study-a vs ./study-b"
๐ง Advanced Usage
Custom Configuration
# Switch to different providers (recommended method)
askyourdocs config setup --provider ollama
askyourdocs config setup --provider openai
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider azure
# Interactive setup (choose provider during setup)
askyourdocs config setup
# Advanced: Direct configuration (for automation/scripts)
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.device "cuda"
askyourdocs config set retrieval.top_k 10
# View current configuration
askyourdocs config show
# Validate configuration
askyourdocs config validate
Monitoring and Maintenance
# Check system status
askyourdocs status
# Refresh index (incremental)
askyourdocs refresh
# Full rebuild (when changing chunk settings)
askyourdocs refresh --reset
# Optimize vector store
askyourdocs refresh --optimize
Backup and Migration
# Create backup
askyourdocs export --output documents-backup.tar.gz --include-config
# Restore from backup
askyourdocs import --input documents-backup.tar.gz
# Merge with existing index
askyourdocs import --input additional-docs.tar.gz --merge
๐ Supported File Formats
| Category | Formats | Extensions |
|---|---|---|
| Documents | PDF, Word, PowerPoint, OpenDocument | .pdf, .docx, .pptx, .odt, .odp |
| Text | Plain text, Markdown, reStructuredText | .txt, .md, .rst, .csv |
| Code | Source code, configuration files | .py, .js, .java, .cpp, .yaml, .json |
| Structured | HTML, XML, LaTeX, Jupyter | .html, .xml, .tex, .ipynb |
๐๏ธ Architecture
AskYourDocs uses a modern RAG architecture:
- Document Ingestion: Files are processed and split into semantic chunks
- Embedding Generation: Text chunks are converted to vector embeddings
- Vector Storage: ChromaDB stores embeddings with metadata for fast retrieval
- Query Processing: User questions are embedded and matched against stored vectors
- Context Retrieval: Most relevant chunks are retrieved based on similarity
- Response Generation: Local LLM generates answers using retrieved context
๐ก๏ธ Privacy & Security
- Local Processing: All operations happen on your machine
- No Data Transmission: Documents never leave your environment
- Secure Storage: Vector data stored locally with optional encryption
- No Telemetry: Zero tracking or analytics
- Open Source: Full transparency with auditable code
๐ Troubleshooting
Common Issues
"Configuration issues found"
# Check what's wrong
askyourdocs status
askyourdocs config validate
# Fix with interactive setup (recommended)
askyourdocs config setup
"Ollama connection failed"
# Check if Ollama is running
ollama list
# Start Ollama if not running
ollama serve
# Test connection
curl http://localhost:11434/api/tags
# Download the default model
ollama pull tinyllama:1.1b
# List available models
ollama list
"No documents found"
# Check current directory
askyourdocs ingest --verbose
# Specify path explicitly
askyourdocs ingest /path/to/documents
# Check supported formats
askyourdocs ingest --include "*.pdf,*.docx,*.txt"
"Embedding model download failed"
# Check internet connection and try again
askyourdocs refresh
# Use different model
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"
"API key not found" (for remote providers)
# Set environment variable first
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
export OPENAI_API_KEY="your-openai-key"
export AZURE_OPENAI_API_KEY="your-azure-key"
# Then configure provider (recommended)
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider openai
askyourdocs config setup --provider azure
# Verify configuration
askyourdocs config validate
askyourdocs status
Performance Issues
# Reduce chunk size
askyourdocs config set chunking.chunk_size 800
# Reduce batch size
askyourdocs config set embedding.batch_size 16
# Optimize storage
askyourdocs refresh --optimize
# Switch to lighter model
askyourdocs config set model.name "tinyllama:1.1b"
# Use GPU acceleration (if available)
askyourdocs config set embedding.device "cuda"
Getting Help
# Show general help
askyourdocs --help
# Show command-specific help
askyourdocs ask --help
askyourdocs ingest --help
# Show current configuration
askyourdocs config show
# Check system status
askyourdocs status
๐งช Development Setup
Using Poetry (Recommended)
# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install --extras "all"
# Run a basic command
poetry run askyourdocs --help
# Install pre-commit hooks
pre-commit install
Using pip (Alternative)
# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install with development dependencies
pip install -e ".[dev,gpu,remote]"
# Install pre-commit hooks
pre-commit install
Development Commands
# Run with coverage
poetry run pytest
# or: pytest
# Run with coverage
poetry run pytest --cov=askyourdocs
# or: pytest --cov=askyourdocs
# Format code
poetry run black src/ tests/
poetry run ruff check src/ tests/
# Type checking
poetry run mypy src/
# Run all quality checks
poetry run pre-commit run --all-files
# Build package
poetry build
# Install locally for testing
poetry install
Note: Local models require initial download but then work offline. Remote models require internet and API costs.
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- LlamaIndex: For the excellent RAG framework
- ChromaDB: For fast vector storage
- Ollama: For local LLM inference
- Rich: For beautiful terminal output
- Click: For the CLI framework
๐ Support
- ๐ง Email: lincolncmba@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askyourdocs-1.0.0.tar.gz.
File metadata
- Download URL: askyourdocs-1.0.0.tar.gz
- Upload date:
- Size: 44.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0eefcb55ede3e8ded1f0aabb56347b9ffff9d8bbbba1ff43a9fd0030c242943
|
|
| MD5 |
59a38d6d2de2208b1cf15df88a961f43
|
|
| BLAKE2b-256 |
7a6b5ffbb4185e36000caf6ab5ec909828af48f633f7b9ee4d9e53c951b864ab
|
File details
Details for the file askyourdocs-1.0.0-py3-none-any.whl.
File metadata
- Download URL: askyourdocs-1.0.0-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50a9c37e7144bcbdb7e54bc22755ad1ccbe9759ec5938aef85a8355e8046f028
|
|
| MD5 |
c8d2941452d8516d994b7b9e0de5b35f
|
|
| BLAKE2b-256 |
cab4bfafd560c24d3f129be7f90aec666609221b46104c65d2c6ce9f9061afe1
|