AI-powered document querying with citations

These details have not been verified by PyPI

Project links

Homepage

Project description

DocNav: AI-Powered Document Querying with Citations

DocNav is a professional, industry-grade document management and querying system that enables you to ask questions about your documents and get accurate answers with source citations. Built for both CLI and Python API usage.

✨ Features

📚 Multi-format Support: PDF, DOCX, TXT, MD, CSV, Excel, PowerPoint
🧠 Smart Chunking: Intelligent document segmentation for better context
🔍 Vector Search: Fast similarity-based document retrieval
🤖 Multiple LLMs: OpenAI, Gemini, Claude support
📝 Citations: Answers include source document references
⚡ Fast Processing: Parallel document processing with progress bars
🎯 Industry Ready: Production-grade with error handling and logging
🔧 Flexible: CLI tool and Python API

🚀 Quick Start

Installation

# Basic installation
pip install docnav

# Full installation with all dependencies
pip install docnav[full]

# With OCR support for scanned PDFs
pip install docnav[full,ocr]

# Development installation
pip install docnav[dev]

CLI Usage

# Create a new corpus
docnav new mydocs

# Add documents
docnav add mydocs documents/ reports.pdf

# Query your documents
docnav query mydocs "What are the main findings?"

# Use different LLM providers
docnav query mydocs "Summarize the budget" --provider gemini --model gemini-2.5-flash
docnav query mydocs "Extract key dates" --provider claude --model claude-3-haiku-20240307

# List documents
docnav list mydocs

# Get statistics
docnav stats mydocs

# Quick query without creating corpus
docnav quick document.pdf "What is this about?"

Python API Usage

from docnav import Corpus, DocumentChunk

# Create or load a corpus
corpus = Corpus("mydocs")

# Add documents
corpus.add(["document.pdf", "report.docx"])

# Ask questions
answer = corpus.ask("What are the main findings?")
print(answer.text)

# Access sources
for source in answer.sources:
    print(f"Source: {source.metadata['file_name']}")
    print(f"Content: {source.text[:200]}...")

# List all documents
documents = corpus.list()
for doc in documents:
    print(f"{doc['file_name']} ({doc['chunks']} chunks)")

# Get statistics
stats = corpus.stats()
print(f"Total documents: {stats['total_documents']}")
print(f"Total chunks: {stats['total_chunks']}")

📋 Commands Reference

Corpus Management

docnav new <name> - Create new corpus
docnav add <corpus> <files> - Add documents to corpus
docnav list <corpus> - List documents in corpus
docnav stats <corpus> - Show corpus statistics
docnav remove <corpus> <file> - Remove specific document
docnav clear <corpus> - Clear entire corpus
docnav corpora - List all available corpora

Querying

docnav query <corpus> "<question>" - Ask question about corpus
docnav quick <file> "<question>" - Quick query single document

Options

--provider <openai|gemini|claude> - LLM provider
--model <model_name> - Specific model to use
--api-key <key> - API key (overrides environment)
--top-k <number> - Number of chunks to consider (default: 5)
--use-ocr - Use OCR for scanned PDFs
--details - Show detailed information

🔧 Configuration

Environment Variables

Set these for different LLM providers:

# OpenAI
export OPENAI_API_KEY="your-openai-key"

# Google Gemini
export GOOGLE_API_KEY="your-gemini-key"

# Anthropic Claude
export ANTHROPIC_API_KEY="your-claude-key"

Default Models

OpenAI: gpt-3.5-turbo
Gemini: gemini-2.5-flash
Claude: claude-3-haiku-20240307

📁 Storage

DocNav stores corpora in ~/.docnav/corpora/ by default:

~/.docnav/
├── corpora/
│   ├── mydocs/
│   │   ├── corpus_index.pkl
│   │   └── metadata.json
│   └── another_corpus/
│       ├── corpus_index.pkl
│       └── metadata.json

🎯 Advanced Usage

Custom Chunking

from docnav import Corpus

# Custom chunk size
corpus = Corpus("mydocs", chunk_size=2000)

# Add with custom chunking
corpus.add(["large_document.pdf"], chunk_size=1500)

Filtering Queries

# Query with metadata filters
answer = corpus.ask(
    "Budget information",
    where={"type": "pdf", "file_name": "budget_report.pdf"}
)

Batch Processing

# Process multiple files efficiently
files = [
    "reports/q1.pdf",
    "reports/q2.pdf", 
    "reports/q3.pdf"
]
corpus.add(files, use_ocr=True)

🔌 API Integration

OpenAI Integration

# Using OpenAI with custom model
answer = corpus.ask(
    "Analyze the trends",
    llm_provider="openai",
    llm_model="gpt-4-turbo",
    api_key="your-key"
)

Gemini Integration

# Using Google Gemini
answer = corpus.ask(
    "Extract insights",
    llm_provider="gemini", 
    llm_model="gemini-2.5-flash",
    api_key="your-gemini-key"
)

Claude Integration

# Using Anthropic Claude
answer = corpus.ask(
    "Summarize findings",
    llm_provider="claude",
    llm_model="claude-3-sonnet-20240229",
    api_key="your-claude-key"
)

🛠️ Development

Setup Development Environment

# Clone repository
git clone https://github.com/Mukesh-Anand-G/DocNav.git
cd DocNav

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black docnav/

Project Structure

docnav/
├── docnav/
│   ├── __init__.py      # Package initialization
│   ├── core.py          # Core functionality
│   ├── cli.py           # Command-line interface
│   └── handlers.py      # CLI command handlers
├── setup.py             # Package setup
├── pyproject.toml       # Modern Python packaging
├── requirements.txt     # Dependencies
└── README.md           # This file

📊 Performance

Processing Speed: ~1000 pages/minute (depends on hardware)
Memory Usage: ~50MB for 1000 documents
Search Latency: <100ms for typical queries
Supported Formats: 10+ document types

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for GPT models
Google for Gemini models
Anthropic for Claude models
Sentence Transformers team for embedding models
All contributors and users

🗺️ Roadmap

Web interface
Real-time document monitoring
Advanced filtering
Graph visualization
Plugin system
Multi-language support

Made with ❤️ by [Mukesh Anand G]

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Jan 15, 2026

1.0.0

Jan 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docnav-1.0.1.tar.gz (21.9 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docnav-1.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file docnav-1.0.1.tar.gz.

File metadata

Download URL: docnav-1.0.1.tar.gz
Upload date: Jan 15, 2026
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for docnav-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`a52bfc765a6d14f34411d8a52e26bc6f13a700efdbd09422ec3c5bcdd31a47d5`
MD5	`c7ccb3c5b2a2c1f6de9c5ee7ebeb6983`
BLAKE2b-256	`4fee3c43219bdcfde986a2a6121bd23fc4810b18054d122f0cc61d1d54dc975d`

See more details on using hashes here.

File details

Details for the file docnav-1.0.1-py3-none-any.whl.

File metadata

Download URL: docnav-1.0.1-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for docnav-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51714713e5650f01a4bf05814fdccd12a5560216c9dee7d1a28ba74103c5fccd`
MD5	`8aefd7117827399e16de4760911446f3`
BLAKE2b-256	`8778b07275606da27b34ae66ce90fe010700962ac6c0eae6759450521e3cda3b`

See more details on using hashes here.

docnav 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DocNav: AI-Powered Document Querying with Citations

✨ Features

🚀 Quick Start

Installation

CLI Usage

Python API Usage

📋 Commands Reference

Corpus Management

Querying

Options

🔧 Configuration

Environment Variables

Default Models

📁 Storage

🎯 Advanced Usage

Custom Chunking

Filtering Queries

Batch Processing

🔌 API Integration

OpenAI Integration

Gemini Integration

Claude Integration

🛠️ Development

Setup Development Environment

Project Structure

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

🗺️ Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes