Skip to main content

Privacy-first, local-only CLI tool that transforms document collections into an intelligent Q&A system

Project description

AskYourDocs ๐Ÿ”๐Ÿ“š

PyPI version Python 3.9+ License: MIT

AskYourDocs is a privacy-first, local-only CLI tool that transforms your document collections into an intelligent Q&A system. Using advanced RAG (Retrieval Augmented Generation) technology, it allows you to ask natural language questions about your documents and get accurate, contextual answers with source citations.

โœจ Key Features

  • ๐Ÿ”’ 100% Privacy: All processing happens locally, your documents never leave your machine
  • ๐Ÿง  Intelligent Q&A: Ask natural language questions and get contextual answers
  • ๐Ÿ“„ Multi-Format Support: PDF, Word, PowerPoint, Markdown, code files, and more
  • โšก Fast Retrieval: Hybrid search combining semantic and keyword matching
  • ๐ŸŽฏ Source Attribution: Every answer includes citations to source documents
  • ๐Ÿ”„ Incremental Updates: Only processes changed files for efficiency
  • ๐ŸŽจ Beautiful CLI: Rich terminal output with progress bars and colors
  • โš™๏ธ Highly Configurable: YAML-based configuration for all settings

๐Ÿš€ Quick Start

Installation

Option 1: Install from PyPI (Recommended)

# Basic installation (local models only)
pip install askyourdocs

# With remote LLM support
pip install askyourdocs[remote]

# With GPU acceleration
pip install askyourdocs[gpu]

# Full installation with all features
pip install askyourdocs[all]

Option 2: Install with Poetry (Development)

# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Install with all extras for development
poetry install --extras "all"

# Run a basic command
poetry run askyourdocs --help

Option 3: Install from Source (Advanced)

# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

# Or install with optional dependencies
pip install -e ".[gpu,remote,dev]"

Setup Prerequisites

For Local Processing (Recommended)

  1. Install Ollama (for local LLM inference):

    # macOS
    brew install ollama
    
    # Linux
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Windows (WSL)
    curl -fsSL https://ollama.ai/install.sh | sh
    
  2. Start Ollama and download the default model:

    # Start Ollama service
    ollama serve
    
    # In another terminal, download the default lightweight model
    ollama pull tinyllama:1.1b
    
    # Or download a more capable model (larger download)
    ollama pull llama3.1:8b
    

For Remote Processing (Optional)

If you prefer to use remote LLM providers, you'll need API keys:

OpenAI Setup:

# Install with OpenAI support
pip install askyourdocs[openai]

# Set your API key
export OPENAI_API_KEY="your-api-key-here"

# Configure for OpenAI
askyourdocs config setup --provider openai

Anthropic Setup:

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Get your API key from https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY="your-api-key-here"

# 3. Configure for Anthropic (recommended)
askyourdocs config setup --provider anthropic

Azure OpenAI Setup:

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Set your credentials
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

# 3. Configure for Azure (recommended)
askyourdocs config setup --provider azure

Basic Usage

  1. Index your documents:

    # Index documents in current directory
    askyourdocs ingest
    
    # Index specific directory
    askyourdocs ingest ./my-documents
    
    # Index with progress and verbose output
    askyourdocs ingest ./docs --verbose
    
  2. Ask questions:

    # Ask a question
    askyourdocs ask "What are the main conclusions in the research papers?"
    
    # Ask with specific number of sources
    askyourdocs ask "How does the API authentication work?" --top-k 5
    
    # Get detailed response with full sources
    askyourdocs ask "Summarize the project requirements" --verbose
    
  3. Interactive mode:

    # Start interactive session
    askyourdocs interactive
    
    # In interactive mode:
    > What is the project timeline?
    > Can you explain the technical architecture?
    > exit
    
  4. Check system status:

    # View system status and configuration
    askyourdocs status
    
    # Validate configuration
    askyourdocs config validate
    
  5. Configuration management:

    # Interactive setup
    askyourdocs config setup
    
    # View configuration
    askyourdocs config show
    
    # Set specific values
    askyourdocs config set model.temperature 0.2
    askyourdocs config set retrieval.top_k 10
    

๐Ÿ“– Command Reference

Core Commands

ingest - Index Documents

askyourdocs ingest [PATH] [OPTIONS]

# Examples:
askyourdocs ingest                          # Current directory
askyourdocs ingest ./documents             # Specific path
askyourdocs ingest --include "*.pdf,*.md"  # Filter file types
askyourdocs ingest --exclude "temp/*"      # Exclude patterns
askyourdocs ingest --force                 # Rebuild entire index
askyourdocs ingest --watch                 # Watch for changes

Options:

  • --include TEXT: File patterns to include (e.g., ".pdf,.docx")
  • --exclude TEXT: File patterns to exclude (e.g., "temp/,.log")
  • --force: Force rebuild of entire index
  • --watch: Watch directory for changes and auto-update
  • --chunk-size INTEGER: Override chunk size for processing
  • --verbose: Show detailed processing information

ask - Query Documents

askyourdocs ask "your question" [OPTIONS]

# Examples:
askyourdocs ask "What is the main thesis?"
askyourdocs ask "How do I configure the database?" --top-k 5
askyourdocs ask "Summarize key findings" --mode compact
askyourdocs ask "What are the requirements?" --stream

Options:

  • --top-k INTEGER: Number of relevant chunks to retrieve (default: 5)
  • --mode TEXT: Response mode (compact/tree_summarize/accumulate)
  • --stream: Stream response as it's generated
  • --no-sources: Don't show source citations
  • --threshold FLOAT: Similarity threshold for retrieval (0.0-1.0)

search - Fast Keyword Search

askyourdocs search "keyword" [OPTIONS]

# Examples:
askyourdocs search "authentication"
askyourdocs search "machine learning" --limit 10
askyourdocs search "API" --format json

refresh - Rebuild Index

askyourdocs refresh [OPTIONS]

# Examples:
askyourdocs refresh                    # Rebuild current index
askyourdocs refresh --reset            # Delete and rebuild from scratch
askyourdocs refresh --optimize         # Optimize vector store

status - System Information

askyourdocs status

# Example output:
๐Ÿ“Š AskYourDocs Status
โ”œโ”€โ”€ ๐Ÿ“ Documents: 1,247 files indexed
โ”œโ”€โ”€ ๐Ÿงฉ Chunks: 5,834 text chunks
โ”œโ”€โ”€ ๐Ÿ’พ Storage: 156.7 MB vector data
โ”œโ”€โ”€ ๐Ÿง  Model: llama3.1:8b (Ollama)
โ”œโ”€โ”€ ๐Ÿ” Embeddings: BAAI/bge-small-en-v1.5
โ””โ”€โ”€ โš™๏ธ Config: ~/.config/askyourdocs/config.yaml

Configuration Commands

config - Manage Configuration

askyourdocs config [COMMAND] [OPTIONS]

# View current configuration
askyourdocs config show
askyourdocs config show --format yaml
askyourdocs config show --section model

# Set configuration values
askyourdocs config set model.name llama3.1:8b
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"

# Interactive setup
askyourdocs config setup
askyourdocs config setup --provider openai

# Validate configuration
askyourdocs config validate

# Reset to defaults
askyourdocs config reset

# Show configuration file location
askyourdocs config path

Advanced Commands

interactive - Interactive Mode

askyourdocs interactive [OPTIONS]

# Start interactive session with custom settings
askyourdocs interactive --top-k 3 --stream

export - Backup Data

askyourdocs export --output backup.tar.gz
askyourdocs export --output backup.tar.gz --include-config

import - Restore Data

askyourdocs import --input backup.tar.gz
askyourdocs import --input backup.tar.gz --merge

๐Ÿ› ๏ธ Configuration

AskYourDocs uses a YAML configuration file located at ~/.config/askyourdocs/config.yaml. You can customize all aspects of the tool:

Local Models (Default - No API Key Required)

model:
  provider: "ollama"           # Local Ollama server
  name: "tinyllama:1.1b"      # Lightweight model (fast, good for most tasks)
  base_url: "http://localhost:11434"
  temperature: 0.1            # Response creativity (0.0-2.0)
  max_tokens: 2048           # Maximum response length

embedding:
  provider: "huggingface"     # Local embeddings
  model: "BAAI/bge-small-en-v1.5"  # Fast, accurate embeddings
  device: "cpu"              # cpu/cuda/mps/auto

Setup Command: askyourdocs config setup --provider ollama

Remote Models (API Key Required)

OpenAI Configuration:

model:
  provider: "openai"
  name: "gpt-4"              # or gpt-3.5-turbo
  api_key: "sk-your-key-here"  # Or set OPENAI_API_KEY env var
  temperature: 0.1
  max_tokens: 2048

embedding:
  provider: "openai"         # Optional: use OpenAI embeddings
  model: "text-embedding-3-small"
  api_key: "sk-your-key-here"

Setup Command: askyourdocs config setup --provider openai

Anthropic Configuration:

model:
  provider: "anthropic"
  name: "claude-3-5-sonnet-20241022"  # Latest Claude model
  api_key: "sk-ant-your-key-here"  # Or set ANTHROPIC_API_KEY env var
  temperature: 0.1
  max_tokens: 2048

embedding:
  provider: "huggingface"  # Keep local embeddings for privacy
  model: "BAAI/bge-small-en-v1.5"

Setup Command: askyourdocs config setup --provider anthropic

Azure OpenAI Configuration:

model:
  provider: "azure"
  name: "gpt-4"
  api_key: "your-azure-key"
  azure_endpoint: "https://your-resource.openai.azure.com/"
  azure_deployment: "your-deployment-name"

Setup Command: askyourdocs config setup --provider azure

Advanced Configuration

Document Processing:

chunking:
  strategy: "sentence"        # sentence/recursive/semantic/fixed
  chunk_size: 1000           # Characters per chunk (100-8000)
  chunk_overlap: 200         # Overlap between chunks
  respect_boundaries: true   # Respect sentence/paragraph boundaries
  min_chunk_size: 100        # Minimum chunk size

Retrieval Settings:

retrieval:
  top_k: 5                   # Number of chunks to retrieve (1-50)
  similarity_threshold: 0.7   # Minimum similarity score (0.0-1.0)
  rerank: true               # Re-rank results for better relevance
  retrieval_mode: "hybrid"   # vector/keyword/hybrid
  max_context_length: 4000   # Maximum context for LLM

Storage Settings:

storage:
  backend: "chromadb"        # Vector database backend
  path: ".askyourdocs"       # Storage directory
  compression: true          # Enable compression
  collection_name: "documents"  # Collection name

๐ŸŽฏ Examples

Quick Start with Local Models

# 1. Install and setup
pip install askyourdocs
ollama serve  # In one terminal
ollama pull tinyllama:1.1b  # In another terminal

# 2. Index your documents
askyourdocs ingest ./my-documents

# 3. Ask questions
askyourdocs ask "What are the key findings?"

Using with OpenAI

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Set up OpenAI API key
export OPENAI_API_KEY="your-api-key"

# 3. Configure for OpenAI
askyourdocs config setup --provider openai

# 4. Index and query documents
askyourdocs ingest ./documents
askyourdocs ask "What are the key findings in these documents?"

# 5. Verify setup
askyourdocs status

Research Papers Analysis

# Index your research papers
askyourdocs ingest ./research-papers --include "*.pdf"

# Ask analytical questions
askyourdocs ask "What are the common methodologies across these studies?"
askyourdocs ask "Which papers mention transformer architecture?"
askyourdocs ask "Summarize the key findings about neural networks"

Code Documentation

# Index your codebase documentation
askyourdocs ingest ./docs --include "*.md,*.rst"

# Query your docs
askyourdocs ask "How do I set up authentication?"
askyourdocs ask "What are the API rate limits?"
askyourdocs ask "Show me examples of database configuration"

Legal Documents

# Index contracts and legal docs
askyourdocs ingest ./legal --include "*.pdf,*.docx"

# Ask specific questions
askyourdocs ask "What are the termination clauses?"
askyourdocs ask "What payment terms are specified?"
askyourdocs ask "Are there any liability limitations?"

# Query specific contract types
askyourdocs ask "What are the key terms?" --path ./employment-contracts
askyourdocs ask "What are the renewal conditions in ./service-agreements?"

Path-Specific Querying

AskYourDocs supports querying specific paths, with automatic ingestion if needed:

# Method 1: Using --path option
askyourdocs ask "What are the main topics?" --path ./research-papers

# Method 2: Include path in question
askyourdocs ask "What are the key findings in ./data-analysis?"

# Auto-ingestion: If path isn't indexed, it will be ingested automatically
askyourdocs ask "Summarize the content" --path ./new-documents

# Multiple path queries
askyourdocs ask "Compare findings in ./study-a vs ./study-b"

๐Ÿ”ง Advanced Usage

Custom Configuration

# Switch to different providers (recommended method)
askyourdocs config setup --provider ollama
askyourdocs config setup --provider openai
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider azure

# Interactive setup (choose provider during setup)
askyourdocs config setup

# Advanced: Direct configuration (for automation/scripts)
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.device "cuda"
askyourdocs config set retrieval.top_k 10

# View current configuration
askyourdocs config show

# Validate configuration
askyourdocs config validate

Monitoring and Maintenance

# Check system status
askyourdocs status

# Refresh index (incremental)
askyourdocs refresh

# Full rebuild (when changing chunk settings)
askyourdocs refresh --reset

# Optimize vector store
askyourdocs refresh --optimize

Backup and Migration

# Create backup
askyourdocs export --output documents-backup.tar.gz --include-config

# Restore from backup
askyourdocs import --input documents-backup.tar.gz

# Merge with existing index
askyourdocs import --input additional-docs.tar.gz --merge

๐Ÿ“ Supported File Formats

Category Formats Extensions
Documents PDF, Word, PowerPoint, OpenDocument .pdf, .docx, .pptx, .odt, .odp
Text Plain text, Markdown, reStructuredText .txt, .md, .rst, .csv
Code Source code, configuration files .py, .js, .java, .cpp, .yaml, .json
Structured HTML, XML, LaTeX, Jupyter .html, .xml, .tex, .ipynb

๐Ÿ—๏ธ Architecture

AskYourDocs uses a modern RAG architecture:

  1. Document Ingestion: Files are processed and split into semantic chunks
  2. Embedding Generation: Text chunks are converted to vector embeddings
  3. Vector Storage: ChromaDB stores embeddings with metadata for fast retrieval
  4. Query Processing: User questions are embedded and matched against stored vectors
  5. Context Retrieval: Most relevant chunks are retrieved based on similarity
  6. Response Generation: Local LLM generates answers using retrieved context

๐Ÿ›ก๏ธ Privacy & Security

  • Local Processing: All operations happen on your machine
  • No Data Transmission: Documents never leave your environment
  • Secure Storage: Vector data stored locally with optional encryption
  • No Telemetry: Zero tracking or analytics
  • Open Source: Full transparency with auditable code

๐Ÿ” Troubleshooting

Common Issues

"Configuration issues found"

# Check what's wrong
askyourdocs status
askyourdocs config validate

# Fix with interactive setup (recommended)
askyourdocs config setup

"Ollama connection failed"

# Check if Ollama is running
ollama list

# Start Ollama if not running
ollama serve

# Test connection
curl http://localhost:11434/api/tags

# Download the default model
ollama pull tinyllama:1.1b

# List available models
ollama list

"No documents found"

# Check current directory
askyourdocs ingest --verbose

# Specify path explicitly
askyourdocs ingest /path/to/documents

# Check supported formats
askyourdocs ingest --include "*.pdf,*.docx,*.txt"

"Embedding model download failed"

# Check internet connection and try again
askyourdocs refresh

# Use different model
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"

"API key not found" (for remote providers)

# Set environment variable first
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
export OPENAI_API_KEY="your-openai-key"
export AZURE_OPENAI_API_KEY="your-azure-key"

# Then configure provider (recommended)
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider openai
askyourdocs config setup --provider azure

# Verify configuration
askyourdocs config validate
askyourdocs status

Performance Issues

# Reduce chunk size
askyourdocs config set chunking.chunk_size 800

# Reduce batch size
askyourdocs config set embedding.batch_size 16

# Optimize storage
askyourdocs refresh --optimize

# Switch to lighter model
askyourdocs config set model.name "tinyllama:1.1b"

# Use GPU acceleration (if available)
askyourdocs config set embedding.device "cuda"

Getting Help

# Show general help
askyourdocs --help

# Show command-specific help
askyourdocs ask --help
askyourdocs ingest --help

# Show current configuration
askyourdocs config show

# Check system status
askyourdocs status

๐Ÿงช Development Setup

Using Poetry (Recommended)

# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install --extras "all"

# Run a basic command
 poetry run askyourdocs --help

# Install pre-commit hooks
pre-commit install

Using pip (Alternative)

# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install with development dependencies
pip install -e ".[dev,gpu,remote]"

# Install pre-commit hooks
pre-commit install

Development Commands

# Run with coverage
poetry run pytest
# or: pytest

# Run with coverage
poetry run pytest --cov=askyourdocs
# or: pytest --cov=askyourdocs

# Format code
poetry run black src/ tests/
poetry run ruff check src/ tests/

# Type checking
poetry run mypy src/

# Run all quality checks
poetry run pre-commit run --all-files

# Build package
poetry build

# Install locally for testing
poetry install

Note: Local models require initial download but then work offline. Remote models require internet and API costs.

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the test suite
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • LlamaIndex: For the excellent RAG framework
  • ChromaDB: For fast vector storage
  • Ollama: For local LLM inference
  • Rich: For beautiful terminal output
  • Click: For the CLI framework

๐Ÿ“ž Support


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askyourdocs-1.0.0.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

askyourdocs-1.0.0-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file askyourdocs-1.0.0.tar.gz.

File metadata

  • Download URL: askyourdocs-1.0.0.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic

File hashes

Hashes for askyourdocs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a0eefcb55ede3e8ded1f0aabb56347b9ffff9d8bbbba1ff43a9fd0030c242943
MD5 59a38d6d2de2208b1cf15df88a961f43
BLAKE2b-256 7a6b5ffbb4185e36000caf6ab5ec909828af48f633f7b9ee4d9e53c951b864ab

See more details on using hashes here.

File details

Details for the file askyourdocs-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: askyourdocs-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic

File hashes

Hashes for askyourdocs-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50a9c37e7144bcbdb7e54bc22755ad1ccbe9759ec5938aef85a8355e8046f028
MD5 c8d2941452d8516d994b7b9e0de5b35f
BLAKE2b-256 cab4bfafd560c24d3f129be7f90aec666609221b46104c65d2c6ce9f9061afe1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page