Skip to main content

A lightweight, vision-based document question-answering system

Project description

DocPixie

A lightweight multimodal RAG (Retrieval-Augmented Generation) library that uses vision AI instead of traditional embeddings or vector databases. DocPixie processes documents as images and uses vision language models for both document understanding and intelligent page selection.

🌟 Features

  • Vision-First Approach: Documents processed as images using PyMuPDF, preserving visual information and formatting
  • No Vector Database Required: Eliminates the complexity of embeddings and vector storage
  • Adaptive RAG Agent: Single intelligent agent that dynamically plans tasks and selects relevant pages
  • Multi-Provider Support: Works with OpenAI GPT-4V, Anthropic Claude, and OpenRouter
  • Modern CLI Interface: Beautiful terminal UI built with Textual
  • Conversation Aware: Maintains context across multiple queries
  • Pluggable Storage: Local filesystem or in-memory storage backends

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/qnguyen3/docpixie.git

# Install dependencies
pip install -r requirements.txt

# Or use uv (recommended)
uv pip install -r requirements.txt

Basic Usage

import asyncio
from docpixie import DocPixie

async def main():
    # Initialize with your API key
    docpixie = DocPixie()

    # Add a document
    document = await docpixie.add_document("path/to/your/document.pdf")
    print(f"Added document: {document.name}")

    # Query the document
    result = await docpixie.query("What are the key findings?")
    print(f"Answer: {result.answer}")
    print(f"Pages used: {result.page_numbers}")

# Run the example
asyncio.run(main())

Using the CLI

Start the interactive terminal interface:

python -m docpixie.cli

The CLI provides:

  • Interactive document chat
  • Document management
  • Conversation history
  • Model configuration
  • Command palette with shortcuts

🛠️ Configuration

DocPixie uses environment variables for API key configuration:

# For OpenAI (default)
export OPENAI_API_KEY="your-openai-key"

# For Anthropic Claude
export ANTHROPIC_API_KEY="your-anthropic-key"

# For OpenRouter (supports many models)
export OPENROUTER_API_KEY="your-openrouter-key"

You can also specify the provider:

from docpixie import DocPixie, DocPixieConfig

config = DocPixieConfig(
    provider="anthropic",  # or "openai", "openrouter"
    model="claude-3-opus-20240229",
    vision_model="claude-3-opus-20240229"
)

docpixie = DocPixie(config=config)

📚 Supported File Types

  • PDF files (.pdf) - Full multipage support
  • More file types coming soon

🏗️ Architecture

DocPixie uses a clean, modular architecture:

📁 Core Components
├── 🧠 Adaptive RAG Agent - Dynamic task planning and execution
├── 👁️  Vision Processing - Document-to-image conversion via PyMuPDF
├── 🔌 Provider System - Unified interface for AI providers
├── 💾 Storage Backends - Local filesystem or in-memory storage
└── 🖥️  CLI Interface - Modern terminal UI with Textual

📁 Processing Flow
1. Document → Images (PyMuPDF)
2. Vision-based summarization
3. Adaptive query processing
4. Intelligent page selection
5. Response synthesis

Key Design Principles

  • Provider-Agnostic: Generic model configuration works across all providers
  • Image-Based Processing: All documents converted to images, preserving visual context
  • Business Logic Separation: Raw API operations separate from workflow logic
  • Adaptive Intelligence: Single agent mode that dynamically adjusts based on findings

🎯 Use Cases

  • Research & Analysis: Query academic papers, reports, and research documents
  • Document Q&A: Interactive questioning of PDFs, contracts, and manuals
  • Content Discovery: Find specific information across large document collections
  • Visual Document Processing: Handle documents with charts, diagrams, and complex layouts

🔧 Development

Setup Development Environment

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # or `.venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/ -v

🌍 Environment Variables

Variable Description Default
OPENAI_API_KEY OpenAI API key None
ANTHROPIC_API_KEY Anthropic API key None
OPENROUTER_API_KEY OpenRouter API key None
DOCPIXIE_PROVIDER AI provider openai
DOCPIXIE_STORAGE_PATH Storage directory ./docpixie_data
DOCPIXIE_JPEG_QUALITY Image quality (1-100) 90

📖 Documentation

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with PyMuPDF for PDF processing
  • CLI powered by Textual
  • Supports OpenAI, Anthropic, and OpenRouter APIs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docpixie-0.1.0.tar.gz (88.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docpixie-0.1.0-py3-none-any.whl (106.1 kB view details)

Uploaded Python 3

File details

Details for the file docpixie-0.1.0.tar.gz.

File metadata

  • Download URL: docpixie-0.1.0.tar.gz
  • Upload date:
  • Size: 88.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for docpixie-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e8305e00c7590e613a117d525a2777a60b72e700898922e8a7811b6abbea42e4
MD5 cebf7a66712bf487118dbf4cc45e4e90
BLAKE2b-256 9cbe364cf20b3b2ea7309bb12733a68bc816b4f15ffab018fec53035308c6ba6

See more details on using hashes here.

File details

Details for the file docpixie-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docpixie-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 106.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for docpixie-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 371fa471169a2e5703b64e52bdebb0bee0af63f9f8c865987e54f955862a1be2
MD5 e590e74d0dd7f6617e13cbfaf4207781
BLAKE2b-256 8be1599cb3d0dc7fcd2231f62eccd107d521975d6e1da66abea87f5d95adbbf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page