Skip to main content

Privacy-first API proxy routing voice+vision to local models with cloud fallback

Project description

Multimodal Voice API Proxy

A privacy-first API proxy that intelligently routes voice and vision requests to local models with automatic cloud fallback.

What is this?

This proxy lets you run AI transcription (Whisper) and vision analysis (LLaVA) locally for privacy and cost savings, while automatically falling back to cloud APIs (OpenAI, Anthropic) when local resources are unavailable. It includes smart caching, usage metering, cost tracking, and multi-tenant API key management—perfect for developers building voice-enabled applications who want local-first privacy with cloud reliability as a safety net.

Features

  • Intelligent routing: Local-first processing with automatic cloud fallback
  • Privacy-focused: Audio and images processed on your hardware by default
  • Smart caching: Redis-backed caching prevents reprocessing identical inputs
  • Cost tracking: Real-time dashboard showing savings vs. cloud-only approach
  • Multi-tenant: API key management with per-key usage limits and metrics
  • OpenAPI-compatible: Drop-in replacement for OpenAI/Anthropic endpoints
  • Production-ready: Rate limiting, monitoring middleware, and async processing
  • Easy deployment: Docker Compose for local, one-click configs for Railway/Fly.io

Quick Start

Prerequisites

  • Docker and Docker Compose
  • 8GB+ RAM (for local models)
  • GPU recommended but optional

Installation

  1. Clone and configure
git clone <repository-url>
cd multimodal-voice-api-proxy
cp .env.example .env
  1. Edit .env with your settings
# Required for cloud fallback
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Database
DATABASE_URL=postgresql://user:pass@db:5432/proxy

# Redis cache
REDIS_URL=redis://redis:6379/0
  1. Launch with Docker Compose
docker-compose up -d
  1. Run migrations
docker-compose exec api alembic upgrade head
  1. Create your first API key
curl -X POST http://localhost:8000/keys \
  -H "Content-Type: application/json" \
  -d '{"name": "My App", "rate_limit": 100}'

The API will be available at http://localhost:8000

Usage

Audio Transcription

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@audio.mp3 \
  -F model=whisper-1

Vision Analysis

curl -X POST http://localhost:8000/v1/vision/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/image.jpg",
    "prompt": "Describe this image"
  }'

Check Usage Stats

curl http://localhost:8000/keys/YOUR_API_KEY/stats \
  -H "Authorization: Bearer YOUR_API_KEY"

Response includes:

  • Total requests (local vs. cloud)
  • Cost savings
  • Cache hit rate
  • Rate limit status

Deployment

Railway

railway up

Fly.io

fly deploy

Self-Hosted

Use the included Dockerfile and docker-compose.yml for custom deployments.

Tech Stack

  • Framework: FastAPI (Python 3.11+)
  • Local Models: faster-whisper, llama-cpp-python
  • Cloud APIs: OpenAI, Anthropic
  • Database: PostgreSQL + SQLAlchemy + Alembic
  • Cache: Redis
  • Deployment: Docker, Railway, Fly.io

Configuration

Key environment variables:

Variable Description Default
LOCAL_MODELS_ENABLED Enable local model processing true
MAX_LOCAL_REQUESTS Concurrent local requests before fallback 5
CACHE_TTL Cache expiration in seconds 3600
RATE_LIMIT_WINDOW Rate limit window in seconds 60

See .env.example for complete configuration options.

License

MIT License - see LICENSE file for details.


Built for developers who value privacy without sacrificing reliability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimodal_voice_api_proxy-0.1.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multimodal_voice_api_proxy-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file multimodal_voice_api_proxy-0.1.0.tar.gz.

File metadata

File hashes

Hashes for multimodal_voice_api_proxy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c885f31d35c493c1921ff6bb13cd5addeac4179b44e5ad714e98b27523f3b144
MD5 5b0019cec616c56b26d47cff745b0d09
BLAKE2b-256 e53c98bc903b8558759d40a1163636ea9353fbdd2fc9dfbf8eea51dbdf59473f

See more details on using hashes here.

File details

Details for the file multimodal_voice_api_proxy-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for multimodal_voice_api_proxy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e41e1382487ae3be7ad72da5b5dc7dee1be35266fe202e0db0f318cfb72a2c7
MD5 e316b22f965c8d2ff40cfdefb948ac4c
BLAKE2b-256 e612e000a1c281e39e27876d6064dc6634a160ce24870091d384618e7c223b26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page