Privacy-first, local-only CLI tool that transforms document collections into an intelligent Q&A system

These details have not been verified by PyPI

Project links

Project description

AskYourDocs 🔍📚

AskYourDocs is a privacy-first, local-only CLI tool that transforms your document collections into an intelligent Q&A system. Using advanced RAG (Retrieval Augmented Generation) technology, it allows you to ask natural language questions about your documents and get accurate, contextual answers with source citations.

✨ Key Features

🔒 100% Privacy: All processing happens locally, your documents never leave your machine
🧠 Intelligent Q&A: Ask natural language questions and get contextual answers
📄 Multi-Format Support: PDF, Word, PowerPoint, Markdown, code files, and more
⚡ Fast Retrieval: Hybrid search combining semantic and keyword matching
🎯 Source Attribution: Every answer includes citations to source documents
🔄 Incremental Updates: Only processes changed files for efficiency
🎨 Beautiful CLI: Rich terminal output with progress bars and colors
⚙️ Highly Configurable: YAML-based configuration for all settings

🚀 Quick Start

Installation

Option 1: Install from PyPI (Recommended)

# Basic installation (local models only)
pip install askyourdocs

# With remote LLM support
pip install askyourdocs[remote]

# With GPU acceleration
pip install askyourdocs[gpu]

# Full installation with all features
pip install askyourdocs[all]

Option 2: Install with Poetry (Development)

# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Install with all extras for development
poetry install --extras "all"

# Run a basic command
poetry run askyourdocs --help

Option 3: Install from Source (Advanced)

# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

# Or install with optional dependencies
pip install -e ".[gpu,remote,dev]"

Setup Prerequisites

For Local Processing (Recommended)

Install Ollama (for local LLM inference):

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows (WSL)
curl -fsSL https://ollama.ai/install.sh | sh

Start Ollama and download the default model:

# Start Ollama service
ollama serve

# In another terminal, download the default lightweight model
ollama pull tinyllama:1.1b

# Or download a more capable model (larger download)
ollama pull llama3.1:8b

For Remote Processing (Optional)

If you prefer to use remote LLM providers, you'll need API keys:

OpenAI Setup:

# Install with OpenAI support
pip install askyourdocs[openai]

# Set your API key
export OPENAI_API_KEY="your-api-key-here"

# Configure for OpenAI
askyourdocs config setup --provider openai

Anthropic Setup:

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Get your API key from https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY="your-api-key-here"

# 3. Configure for Anthropic (recommended)
askyourdocs config setup --provider anthropic

Azure OpenAI Setup:

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Set your credentials
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

# 3. Configure for Azure (recommended)
askyourdocs config setup --provider azure

Basic Usage

Index your documents:

# Index documents in current directory
askyourdocs ingest

# Index specific directory
askyourdocs ingest ./my-documents

# Index with progress and verbose output
askyourdocs ingest ./docs --verbose

Ask questions:

# Ask a question
askyourdocs ask "What are the main conclusions in the research papers?"

# Ask with specific number of sources
askyourdocs ask "How does the API authentication work?" --top-k 5

# Get detailed response with full sources
askyourdocs ask "Summarize the project requirements" --verbose

Interactive mode:

# Start interactive session
askyourdocs interactive

# In interactive mode:
> What is the project timeline?
> Can you explain the technical architecture?
> exit

Check system status:

# View system status and configuration
askyourdocs status

# Validate configuration
askyourdocs config validate

Configuration management:

# Interactive setup
askyourdocs config setup

# View configuration
askyourdocs config show

# Set specific values
askyourdocs config set model.temperature 0.2
askyourdocs config set retrieval.top_k 10

📖 Command Reference

Core Commands

`ingest` - Index Documents

askyourdocs ingest [PATH] [OPTIONS]

# Examples:
askyourdocs ingest                          # Current directory
askyourdocs ingest ./documents             # Specific path
askyourdocs ingest --include "*.pdf,*.md"  # Filter file types
askyourdocs ingest --exclude "temp/*"      # Exclude patterns
askyourdocs ingest --force                 # Rebuild entire index
askyourdocs ingest --watch                 # Watch for changes

Options:

--include TEXT: File patterns to include (e.g., ".pdf,.docx")
--exclude TEXT: File patterns to exclude (e.g., "temp/,.log")
--force: Force rebuild of entire index
--watch: Watch directory for changes and auto-update
--chunk-size INTEGER: Override chunk size for processing
--verbose: Show detailed processing information

`ask` - Query Documents

askyourdocs ask "your question" [OPTIONS]

# Examples:
askyourdocs ask "What is the main thesis?"
askyourdocs ask "How do I configure the database?" --top-k 5
askyourdocs ask "Summarize key findings" --mode compact
askyourdocs ask "What are the requirements?" --stream

Options:

--top-k INTEGER: Number of relevant chunks to retrieve (default: 5)
--mode TEXT: Response mode (compact/tree_summarize/accumulate)
--stream: Stream response as it's generated
--no-sources: Don't show source citations
--threshold FLOAT: Similarity threshold for retrieval (0.0-1.0)

`search` - Fast Keyword Search

askyourdocs search "keyword" [OPTIONS]

# Examples:
askyourdocs search "authentication"
askyourdocs search "machine learning" --limit 10
askyourdocs search "API" --format json

`refresh` - Rebuild Index

askyourdocs refresh [OPTIONS]

# Examples:
askyourdocs refresh                    # Rebuild current index
askyourdocs refresh --reset            # Delete and rebuild from scratch
askyourdocs refresh --optimize         # Optimize vector store

`status` - System Information

askyourdocs status

# Example output:
📊 AskYourDocs Status
├── 📁 Documents: 1,247 files indexed
├── 🧩 Chunks: 5,834 text chunks
├── 💾 Storage: 156.7 MB vector data
├── 🧠 Model: llama3.1:8b (Ollama)
├── 🔍 Embeddings: BAAI/bge-small-en-v1.5
└── ⚙️ Config: ~/.config/askyourdocs/config.yaml

Configuration Commands

`config` - Manage Configuration

askyourdocs config [COMMAND] [OPTIONS]

# View current configuration
askyourdocs config show
askyourdocs config show --format yaml
askyourdocs config show --section model

# Set configuration values
askyourdocs config set model.name llama3.1:8b
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"

# Interactive setup
askyourdocs config setup
askyourdocs config setup --provider openai

# Validate configuration
askyourdocs config validate

# Reset to defaults
askyourdocs config reset

# Show configuration file location
askyourdocs config path

Advanced Commands

`interactive` - Interactive Mode

askyourdocs interactive [OPTIONS]

# Start interactive session with custom settings
askyourdocs interactive --top-k 3 --stream

`export` - Backup Data

askyourdocs export --output backup.tar.gz
askyourdocs export --output backup.tar.gz --include-config

`import` - Restore Data

askyourdocs import --input backup.tar.gz
askyourdocs import --input backup.tar.gz --merge

🛠️ Configuration

AskYourDocs uses a YAML configuration file located at ~/.config/askyourdocs/config.yaml. You can customize all aspects of the tool:

Local Models (Default - No API Key Required)

model:
  provider: "ollama"           # Local Ollama server
  name: "tinyllama:1.1b"      # Lightweight model (fast, good for most tasks)
  base_url: "http://localhost:11434"
  temperature: 0.1            # Response creativity (0.0-2.0)
  max_tokens: 2048           # Maximum response length

embedding:
  provider: "huggingface"     # Local embeddings
  model: "BAAI/bge-small-en-v1.5"  # Fast, accurate embeddings
  device: "cpu"              # cpu/cuda/mps/auto

Setup Command: askyourdocs config setup --provider ollama

Remote Models (API Key Required)

OpenAI Configuration:

model:
  provider: "openai"
  name: "gpt-4"              # or gpt-3.5-turbo
  api_key: "sk-your-key-here"  # Or set OPENAI_API_KEY env var
  temperature: 0.1
  max_tokens: 2048

embedding:
  provider: "openai"         # Optional: use OpenAI embeddings
  model: "text-embedding-3-small"
  api_key: "sk-your-key-here"

Setup Command: askyourdocs config setup --provider openai

Anthropic Configuration:

model:
  provider: "anthropic"
  name: "claude-3-5-sonnet-20241022"  # Latest Claude model
  api_key: "sk-ant-your-key-here"  # Or set ANTHROPIC_API_KEY env var
  temperature: 0.1
  max_tokens: 2048

embedding:
  provider: "huggingface"  # Keep local embeddings for privacy
  model: "BAAI/bge-small-en-v1.5"

Setup Command: askyourdocs config setup --provider anthropic

Azure OpenAI Configuration:

model:
  provider: "azure"
  name: "gpt-4"
  api_key: "your-azure-key"
  azure_endpoint: "https://your-resource.openai.azure.com/"
  azure_deployment: "your-deployment-name"

Setup Command: askyourdocs config setup --provider azure

Advanced Configuration

Document Processing:

chunking:
  strategy: "sentence"        # sentence/recursive/semantic/fixed
  chunk_size: 1000           # Characters per chunk (100-8000)
  chunk_overlap: 200         # Overlap between chunks
  respect_boundaries: true   # Respect sentence/paragraph boundaries
  min_chunk_size: 100        # Minimum chunk size

Retrieval Settings:

retrieval:
  top_k: 5                   # Number of chunks to retrieve (1-50)
  similarity_threshold: 0.7   # Minimum similarity score (0.0-1.0)
  rerank: true               # Re-rank results for better relevance
  retrieval_mode: "hybrid"   # vector/keyword/hybrid
  max_context_length: 4000   # Maximum context for LLM

Storage Settings:

storage:
  backend: "chromadb"        # Vector database backend
  path: ".askyourdocs"       # Storage directory
  compression: true          # Enable compression
  collection_name: "documents"  # Collection name

🎯 Examples

Quick Start with Local Models

# 1. Install and setup
pip install askyourdocs
ollama serve  # In one terminal
ollama pull tinyllama:1.1b  # In another terminal

# 2. Index your documents
askyourdocs ingest ./my-documents

# 3. Ask questions
askyourdocs ask "What are the key findings?"

Using with OpenAI

# 1. Install with remote provider support
pip install askyourdocs[remote]

# 2. Set up OpenAI API key
export OPENAI_API_KEY="your-api-key"

# 3. Configure for OpenAI
askyourdocs config setup --provider openai

# 4. Index and query documents
askyourdocs ingest ./documents
askyourdocs ask "What are the key findings in these documents?"

# 5. Verify setup
askyourdocs status

Research Papers Analysis

# Index your research papers
askyourdocs ingest ./research-papers --include "*.pdf"

# Ask analytical questions
askyourdocs ask "What are the common methodologies across these studies?"
askyourdocs ask "Which papers mention transformer architecture?"
askyourdocs ask "Summarize the key findings about neural networks"

Code Documentation

# Index your codebase documentation
askyourdocs ingest ./docs --include "*.md,*.rst"

# Query your docs
askyourdocs ask "How do I set up authentication?"
askyourdocs ask "What are the API rate limits?"
askyourdocs ask "Show me examples of database configuration"

Legal Documents

# Index contracts and legal docs
askyourdocs ingest ./legal --include "*.pdf,*.docx"

# Ask specific questions
askyourdocs ask "What are the termination clauses?"
askyourdocs ask "What payment terms are specified?"
askyourdocs ask "Are there any liability limitations?"

# Query specific contract types
askyourdocs ask "What are the key terms?" --path ./employment-contracts
askyourdocs ask "What are the renewal conditions in ./service-agreements?"

Path-Specific Querying

AskYourDocs supports querying specific paths, with automatic ingestion if needed:

# Method 1: Using --path option
askyourdocs ask "What are the main topics?" --path ./research-papers

# Method 2: Include path in question
askyourdocs ask "What are the key findings in ./data-analysis?"

# Auto-ingestion: If path isn't indexed, it will be ingested automatically
askyourdocs ask "Summarize the content" --path ./new-documents

# Multiple path queries
askyourdocs ask "Compare findings in ./study-a vs ./study-b"

🔧 Advanced Usage

Custom Configuration

# Switch to different providers (recommended method)
askyourdocs config setup --provider ollama
askyourdocs config setup --provider openai
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider azure

# Interactive setup (choose provider during setup)
askyourdocs config setup

# Advanced: Direct configuration (for automation/scripts)
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.device "cuda"
askyourdocs config set retrieval.top_k 10

# View current configuration
askyourdocs config show

# Validate configuration
askyourdocs config validate

Monitoring and Maintenance

# Check system status
askyourdocs status

# Refresh index (incremental)
askyourdocs refresh

# Full rebuild (when changing chunk settings)
askyourdocs refresh --reset

# Optimize vector store
askyourdocs refresh --optimize

Backup and Migration

# Create backup
askyourdocs export --output documents-backup.tar.gz --include-config

# Restore from backup
askyourdocs import --input documents-backup.tar.gz

# Merge with existing index
askyourdocs import --input additional-docs.tar.gz --merge

📁 Supported File Formats

Category	Formats	Extensions
Documents	PDF, Word, PowerPoint, OpenDocument	`.pdf`, `.docx`, `.pptx`, `.odt`, `.odp`
Text	Plain text, Markdown, reStructuredText	`.txt`, `.md`, `.rst`, `.csv`
Code	Source code, configuration files	`.py`, `.js`, `.java`, `.cpp`, `.yaml`, `.json`
Structured	HTML, XML, LaTeX, Jupyter	`.html`, `.xml`, `.tex`, `.ipynb`

🏗️ Architecture

AskYourDocs uses a modern RAG architecture:

Document Ingestion: Files are processed and split into semantic chunks
Embedding Generation: Text chunks are converted to vector embeddings
Vector Storage: ChromaDB stores embeddings with metadata for fast retrieval
Query Processing: User questions are embedded and matched against stored vectors
Context Retrieval: Most relevant chunks are retrieved based on similarity
Response Generation: Local LLM generates answers using retrieved context

🛡️ Privacy & Security

Local Processing: All operations happen on your machine
No Data Transmission: Documents never leave your environment
Secure Storage: Vector data stored locally with optional encryption
No Telemetry: Zero tracking or analytics
Open Source: Full transparency with auditable code

🔍 Troubleshooting

Common Issues

"Configuration issues found"

# Check what's wrong
askyourdocs status
askyourdocs config validate

# Fix with interactive setup (recommended)
askyourdocs config setup

"Ollama connection failed"

# Check if Ollama is running
ollama list

# Start Ollama if not running
ollama serve

# Test connection
curl http://localhost:11434/api/tags

# Download the default model
ollama pull tinyllama:1.1b

# List available models
ollama list

"No documents found"

# Check current directory
askyourdocs ingest --verbose

# Specify path explicitly
askyourdocs ingest /path/to/documents

# Check supported formats
askyourdocs ingest --include "*.pdf,*.docx,*.txt"

"Embedding model download failed"

# Check internet connection and try again
askyourdocs refresh

# Use different model
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"

"API key not found" (for remote providers)

# Set environment variable first
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
export OPENAI_API_KEY="your-openai-key"
export AZURE_OPENAI_API_KEY="your-azure-key"

# Then configure provider (recommended)
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider openai
askyourdocs config setup --provider azure

# Verify configuration
askyourdocs config validate
askyourdocs status

Performance Issues

# Reduce chunk size
askyourdocs config set chunking.chunk_size 800

# Reduce batch size
askyourdocs config set embedding.batch_size 16

# Optimize storage
askyourdocs refresh --optimize

# Switch to lighter model
askyourdocs config set model.name "tinyllama:1.1b"

# Use GPU acceleration (if available)
askyourdocs config set embedding.device "cuda"

Getting Help

# Show general help
askyourdocs --help

# Show command-specific help
askyourdocs ask --help
askyourdocs ingest --help

# Show current configuration
askyourdocs config show

# Check system status
askyourdocs status

🧪 Development Setup

Using Poetry (Recommended)

# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install --extras "all"

# Run a basic command
 poetry run askyourdocs --help

# Install pre-commit hooks
pre-commit install

Using pip (Alternative)

# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install with development dependencies
pip install -e ".[dev,gpu,remote]"

# Install pre-commit hooks
pre-commit install

Development Commands

# Run with coverage
poetry run pytest
# or: pytest

# Run with coverage
poetry run pytest --cov=askyourdocs
# or: pytest --cov=askyourdocs

# Format code
poetry run black src/ tests/
poetry run ruff check src/ tests/

# Type checking
poetry run mypy src/

# Run all quality checks
poetry run pre-commit run --all-files

# Build package
poetry build

# Install locally for testing
poetry install

Note: Local models require initial download but then work offline. Remote models require internet and API costs.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch
Make your changes with tests
Run the test suite
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LlamaIndex: For the excellent RAG framework
ChromaDB: For fast vector storage
Ollama: For local LLM inference
Rich: For beautiful terminal output
Click: For the CLI framework

📞 Support

📧 Email: lincolncmba@gmail.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Oct 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askyourdocs-1.0.0.tar.gz (44.7 kB view details)

Uploaded Oct 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

askyourdocs-1.0.0-py3-none-any.whl (46.1 kB view details)

Uploaded Oct 6, 2025 Python 3

File details

Details for the file askyourdocs-1.0.0.tar.gz.

File metadata

Download URL: askyourdocs-1.0.0.tar.gz
Upload date: Oct 6, 2025
Size: 44.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic

File hashes

Hashes for askyourdocs-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a0eefcb55ede3e8ded1f0aabb56347b9ffff9d8bbbba1ff43a9fd0030c242943`
MD5	`59a38d6d2de2208b1cf15df88a961f43`
BLAKE2b-256	`7a6b5ffbb4185e36000caf6ab5ec909828af48f633f7b9ee4d9e53c951b864ab`

See more details on using hashes here.

File details

Details for the file askyourdocs-1.0.0-py3-none-any.whl.

File metadata

Download URL: askyourdocs-1.0.0-py3-none-any.whl
Upload date: Oct 6, 2025
Size: 46.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.13.0 Linux/6.14.0-33-generic

File hashes

Hashes for askyourdocs-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`50a9c37e7144bcbdb7e54bc22755ad1ccbe9759ec5938aef85a8355e8046f028`
MD5	`c8d2941452d8516d994b7b9e0de5b35f`
BLAKE2b-256	`cab4bfafd560c24d3f129be7f90aec666609221b46104c65d2c6ce9f9061afe1`

See more details on using hashes here.

askyourdocs 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AskYourDocs 🔍📚

✨ Key Features

🚀 Quick Start

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install with Poetry (Development)

Option 3: Install from Source (Advanced)

Setup Prerequisites

For Local Processing (Recommended)

For Remote Processing (Optional)

Basic Usage

📖 Command Reference

Core Commands

ingest - Index Documents

ask - Query Documents

search - Fast Keyword Search

refresh - Rebuild Index

status - System Information

Configuration Commands

config - Manage Configuration

Advanced Commands

interactive - Interactive Mode

export - Backup Data

import - Restore Data

🛠️ Configuration

Local Models (Default - No API Key Required)

Remote Models (API Key Required)

Advanced Configuration

🎯 Examples

Quick Start with Local Models

Using with OpenAI

Research Papers Analysis

Code Documentation

Legal Documents

Path-Specific Querying

🔧 Advanced Usage

Custom Configuration

Monitoring and Maintenance

Backup and Migration

📁 Supported File Formats

🏗️ Architecture

🛡️ Privacy & Security

🔍 Troubleshooting

Common Issues

Getting Help

🧪 Development Setup

Using Poetry (Recommended)

Using pip (Alternative)

Development Commands

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ingest` - Index Documents

`ask` - Query Documents

`search` - Fast Keyword Search

`refresh` - Rebuild Index

`status` - System Information

`config` - Manage Configuration

`interactive` - Interactive Mode

`export` - Backup Data

`import` - Restore Data