Unified multimodal CLI assistant built on OpenAI's latest Responses API

These details have not been verified by PyPI

Project description

edge-assistant

A unified multimodal CLI assistant for AI-powered research, content analysis, knowledge base management, and safe file editing. Built on OpenAI's latest Responses API with full threading support across text, images, and documents.

Quickstart

Create and activate a virtual environment (recommended):

python3 -m venv .venv
source .venv/bin/activate

Install the project in editable mode:

pip install --upgrade pip
pip install -e .

Configure your OpenAI key (choose one option):

Option A: Environment variable

export OPENAI_API_KEY="sk-..."

Option B: .env file (recommended)

echo 'OPENAI_API_KEY="sk-..."' > .env

See available commands:

edge-assistant --help

Key Features

🔥 Unified Multimodal Analysis - Seamlessly work with text, images, PDFs, and documents in threaded conversations
🧠 Advanced Threading - Maintain context across mixed content types with intelligent state management
🔍 Web Research - Built-in web search with structured output and citations
📚 Knowledge Base - Index and search local documents with vector embeddings
✏️ Safe Editing - Preview file changes with unified diffs before applying
🛠️ Agent Mode - Tool-calling AI with file system access and approval workflows
⚡ Latest API - Built on OpenAI's Responses API for optimal performance and features

Commands Overview

Command	Description	Threading	Content Types
`analyze`	Unified multimodal analysis	✅	Text, Images, PDFs, Documents
`ask`	Interactive text conversations	✅	Text
`research`	Web research with citations	❌	Text + Web Search
`kb-index`	Index documents for search	❌	Local Files
`kb-research`	Query knowledge base	❌	Text + KB Search
`edit`	Safe file editing with diffs	❌	Text + File Editing
`agent`	Tool-calling AI assistant	❌	Text + Tools
`analyze-image`	Legacy image analysis	✅	Images (deprecated)

Examples

🎯 Unified Multimodal Analysis (New!)

# Text-only analysis with threading
edge-assistant analyze "What are the key principles of good software architecture?" --thread project-review

# Image analysis with context
edge-assistant analyze "What safety issues do you see in this facility?" --file facility.jpg --thread project-review --system "You are a health and safety inspector"

# Continue conversation with document analysis
edge-assistant analyze "Based on our safety assessment, analyze this compliance report" --file report.pdf --thread project-review

# Mixed content conversation
edge-assistant analyze "Given everything we've discussed, what are your top 3 recommendations?" --thread project-review

🔍 Web Research

# Research with structured output and citations
edge-assistant research "latest developments in multimodal AI 2025"
edge-assistant research "best practices for RAG implementation"

💬 Interactive Conversations

# Text conversations with threading (now uses unified engine by default)
edge-assistant ask "Explain the difference between RAG and fine-tuning" --thread learning
edge-assistant ask "Can you give me examples of each approach?" --thread learning

# Use legacy engine if needed
edge-assistant ask "Simple question" --legacy

📚 Knowledge Base Management

# Index local documents for search
edge-assistant kb-index ./docs ./papers ./notes

# Search your indexed knowledge
edge-assistant kb-research "How does attention mechanism work in transformers?"

✏️ Safe File Editing

# Preview changes before applying (dry-run by default)
edge-assistant edit README.md "Add a quickstart section with installation instructions"

# Apply changes after review
edge-assistant edit README.md "Add a quickstart section" --apply

🛠️ Agent Mode with Tools

# Tool-calling AI with file system access
edge-assistant agent "Create a Python script that processes CSV files and generates plots" --approve
edge-assistant agent "Analyze the performance of my web app and suggest optimizations"

🎨 Multimodal Analysis Features

Unified Content Support

📝 Text: Natural language questions and conversations
🖼️ Images: JPEG, PNG, GIF, WebP analysis with vision models
📄 Documents: PDF, TXT, MD, code files with file search
🔜 Audio/Video: Ready for future OpenAI capabilities

Advanced Threading

Fresh Context (default): Each analysis is independent
Threaded Conversations: Use --thread to maintain context across any content types
Smart Limits: Max 20 interactions per thread (configurable with --max-interactions)
Content Tracking: Detailed breakdown by content type (text, image, file)
Auto-cleanup: Old threads (7+ days) are automatically removed

Thread Management

# Check thread status (shows interaction breakdown by content type)
edge-assistant analyze "Describe this" --file image.jpg --thread session
# Output: Thread 'session': 3 interactions (1 text, 2 image)

# Clear a specific thread
edge-assistant analyze --clear-thread --thread session

# Set custom interaction limit per thread  
edge-assistant analyze "Analyze this" --file doc.pdf --thread session --max-interactions 50

# Mix content types seamlessly in same thread
edge-assistant analyze "What are the main concepts?" --thread session                    # Text
edge-assistant analyze "How does this image relate?" --file chart.png --thread session  # Image  
edge-assistant analyze "What does this document say?" --file report.pdf --thread session # Document

Specialized Analysis Use Cases

# Health & Safety Inspection Workflow
edge-assistant analyze "Assess safety compliance" --file facility.jpg --thread safety-audit --system "You are a health and safety inspector"
edge-assistant analyze "Review this incident report" --file report.pdf --thread safety-audit  
edge-assistant analyze "Based on our inspection and the report, what are your recommendations?" --thread safety-audit

# Document Analysis & OCR
edge-assistant analyze "Extract all text and key information" --file receipt.png --system "You are an OCR specialist with accounting expertise"
edge-assistant analyze "Summarize the financial data from the receipt" --thread expense-review

# Technical Architecture Review  
edge-assistant analyze "Explain this system architecture" --file diagram.png --system "You are a software architect"
edge-assistant analyze "Based on the diagram, what are potential scalability concerns?" --thread arch-review
edge-assistant analyze "Review this code for the same system" --file main.py --thread arch-review

# Research & Analysis Pipeline
edge-assistant analyze "What are the main themes in this research paper?" --file paper.pdf --thread research
edge-assistant analyze "How does this data visualization support the paper's claims?" --file chart.jpg --thread research
edge-assistant analyze "Synthesize the key findings and implications" --thread research

Content Type Detection

The system automatically detects content types, but you can override:

# Auto-detection (default)
edge-assistant analyze "Analyze this" --file document.pdf --type auto

# Force specific type
edge-assistant analyze "Analyze as image" --file diagram.pdf --type image
edge-assistant analyze "Analyze as document" --file screenshot.png --type file
edge-assistant analyze "Text-only analysis" --type text

Model Selection

# Auto-select optimal model based on content type (default)  
edge-assistant analyze "Question" --file content.jpg

# Override model selection
edge-assistant analyze "Question" --file image.jpg --model gpt-4o-mini
edge-assistant analyze "Question" --file document.pdf --model gpt-4o

🏗️ Architecture

Core Components

cli.py - Typer-based CLI interface with unified multimodal commands
engine.py - OpenAI Responses API wrapper with multimodal support and threading
tools.py - Utility functions for diffs, text extraction, URL parsing, and function tools
state.py - XDG-compliant state management with multimodal thread tracking

Key Design Principles

API Consistency: All content types use OpenAI Responses API for threading and state management
Backward Compatibility: Legacy commands maintained while encouraging migration to unified interface
Content Agnostic: Same threading system works across text, images, documents, and future modalities
Smart Defaults: Auto-detection and optimal model selection reduce cognitive overhead
Safety First: Dry-run by default for destructive operations, with explicit approval workflows

State Management

Thread Persistence: XDG-compliant JSON storage with automatic cleanup
Content Tracking: Detailed metadata per thread including content type breakdown
Cross-Modal Threading: Seamless context preservation across different content types
Legacy Support: Backward compatibility with existing thread structures

Responses API Integration

Unified Interface: Single method handles text, images, documents via analyze_multimodal_content()
Proper Threading: Uses previous_response_id for server-side state management
Content Detection: Automatic file type detection with manual override capability
Future Ready: Architecture prepared for audio, video, and other upcoming modalities

Dev Notes

Dependencies

Core: openai, typer, rich, platformdirs, python-dotenv

Environment Setup

# Create virtual environment  
python3 -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install --upgrade pip && pip install -e .

# Configure API key
echo 'OPENAI_API_KEY="sk-..."' > .env

🧪 Testing

Run the test suite after installing test dependencies:

pip install pytest
pytest -q

The test suite includes CLI command validation and basic functionality tests using Typer's CliRunner.

🔄 Migration Guide

From Legacy Commands

Image Analysis: The new analyze command replaces analyze-image:

# Old (still works but deprecated)
edge-assistant analyze-image image.jpg "Describe this" --thread session

# New (recommended)  
edge-assistant analyze "Describe this" --file image.jpg --thread session

Enhanced Ask: The ask command now uses the unified multimodal engine by default:

# Automatic (uses new engine)
edge-assistant ask "Question" --thread session

# Force legacy engine if needed
edge-assistant ask "Question" --thread session --legacy

Thread Compatibility

Existing text threads: Fully compatible with new multimodal system
Legacy vision threads: Automatically migrated to new multimodal format
Thread data: All existing thread data preserved during migration

📋 Command Reference

# Get help for any command
edge-assistant --help
edge-assistant analyze --help
edge-assistant ask --help

# Version information  
edge-assistant --version

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes following the existing code style
Add tests for new functionality
Run the test suite: pytest
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edge_assistant-0.1.0.tar.gz (21.4 kB view details)

Uploaded Sep 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edge_assistant-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Sep 16, 2025 Python 3

File details

Details for the file edge_assistant-0.1.0.tar.gz.

File metadata

Download URL: edge_assistant-0.1.0.tar.gz
Upload date: Sep 16, 2025
Size: 21.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for edge_assistant-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2161e9b26e1463bd2eb33c435a64959b12d290d152a3eacdd61b732b8b9293ce`
MD5	`827f7bc40b29183f2649fd5ad6943cde`
BLAKE2b-256	`45d7503f2a84c31a893aac77a10c088ea6232cd33d50d0a9b05f93d36b34f967`

See more details on using hashes here.

File details

Details for the file edge_assistant-0.1.0-py3-none-any.whl.

File metadata

Download URL: edge_assistant-0.1.0-py3-none-any.whl
Upload date: Sep 16, 2025
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for edge_assistant-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6194a82c4f6b99c367bd932854f0a772a6df1a409f6e48ed5fd47740f11183a9`
MD5	`2829df57d21c8c609a627ee57eb10906`
BLAKE2b-256	`b741f30cb78dc64af8e8fe73114b7f21d124b75030ac249abe9e4f2f86e4f294`

See more details on using hashes here.

edge-assistant 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

edge-assistant

Quickstart

Key Features

Commands Overview

Examples

🎯 Unified Multimodal Analysis (New!)

🔍 Web Research

💬 Interactive Conversations

📚 Knowledge Base Management

✏️ Safe File Editing

🛠️ Agent Mode with Tools

🎨 Multimodal Analysis Features

Unified Content Support

Advanced Threading

Thread Management

Specialized Analysis Use Cases

Content Type Detection

Model Selection

🏗️ Architecture

Core Components

Key Design Principles

State Management

Responses API Integration

Dev Notes

Dependencies

Environment Setup

🧪 Testing

🔄 Migration Guide

From Legacy Commands

Thread Compatibility

📋 Command Reference

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes