Privacy-first API proxy routing voice+vision to local models with cloud fallback
Project description
Multimodal Voice API Proxy
A privacy-first API proxy that intelligently routes voice and vision requests to local models with automatic cloud fallback.
What is this?
This proxy lets you run AI transcription (Whisper) and vision analysis (LLaVA) locally for privacy and cost savings, while automatically falling back to cloud APIs (OpenAI, Anthropic) when local resources are unavailable. It includes smart caching, usage metering, cost tracking, and multi-tenant API key management—perfect for developers building voice-enabled applications who want local-first privacy with cloud reliability as a safety net.
Features
- Intelligent routing: Local-first processing with automatic cloud fallback
- Privacy-focused: Audio and images processed on your hardware by default
- Smart caching: Redis-backed caching prevents reprocessing identical inputs
- Cost tracking: Real-time dashboard showing savings vs. cloud-only approach
- Multi-tenant: API key management with per-key usage limits and metrics
- OpenAPI-compatible: Drop-in replacement for OpenAI/Anthropic endpoints
- Production-ready: Rate limiting, monitoring middleware, and async processing
- Easy deployment: Docker Compose for local, one-click configs for Railway/Fly.io
Quick Start
Prerequisites
- Docker and Docker Compose
- 8GB+ RAM (for local models)
- GPU recommended but optional
Installation
- Clone and configure
git clone <repository-url>
cd multimodal-voice-api-proxy
cp .env.example .env
- Edit
.envwith your settings
# Required for cloud fallback
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Database
DATABASE_URL=postgresql://user:pass@db:5432/proxy
# Redis cache
REDIS_URL=redis://redis:6379/0
- Launch with Docker Compose
docker-compose up -d
- Run migrations
docker-compose exec api alembic upgrade head
- Create your first API key
curl -X POST http://localhost:8000/keys \
-H "Content-Type: application/json" \
-d '{"name": "My App", "rate_limit": 100}'
The API will be available at http://localhost:8000
Usage
Audio Transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F file=@audio.mp3 \
-F model=whisper-1
Vision Analysis
curl -X POST http://localhost:8000/v1/vision/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/image.jpg",
"prompt": "Describe this image"
}'
Check Usage Stats
curl http://localhost:8000/keys/YOUR_API_KEY/stats \
-H "Authorization: Bearer YOUR_API_KEY"
Response includes:
- Total requests (local vs. cloud)
- Cost savings
- Cache hit rate
- Rate limit status
Deployment
Railway
railway up
Fly.io
fly deploy
Self-Hosted
Use the included Dockerfile and docker-compose.yml for custom deployments.
Tech Stack
- Framework: FastAPI (Python 3.11+)
- Local Models: faster-whisper, llama-cpp-python
- Cloud APIs: OpenAI, Anthropic
- Database: PostgreSQL + SQLAlchemy + Alembic
- Cache: Redis
- Deployment: Docker, Railway, Fly.io
Configuration
Key environment variables:
| Variable | Description | Default |
|---|---|---|
LOCAL_MODELS_ENABLED |
Enable local model processing | true |
MAX_LOCAL_REQUESTS |
Concurrent local requests before fallback | 5 |
CACHE_TTL |
Cache expiration in seconds | 3600 |
RATE_LIMIT_WINDOW |
Rate limit window in seconds | 60 |
See .env.example for complete configuration options.
License
MIT License - see LICENSE file for details.
Built for developers who value privacy without sacrificing reliability.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file multimodal_voice_api_proxy-0.1.0.tar.gz.
File metadata
- Download URL: multimodal_voice_api_proxy-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c885f31d35c493c1921ff6bb13cd5addeac4179b44e5ad714e98b27523f3b144
|
|
| MD5 |
5b0019cec616c56b26d47cff745b0d09
|
|
| BLAKE2b-256 |
e53c98bc903b8558759d40a1163636ea9353fbdd2fc9dfbf8eea51dbdf59473f
|
File details
Details for the file multimodal_voice_api_proxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: multimodal_voice_api_proxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e41e1382487ae3be7ad72da5b5dc7dee1be35266fe202e0db0f318cfb72a2c7
|
|
| MD5 |
e316b22f965c8d2ff40cfdefb948ac4c
|
|
| BLAKE2b-256 |
e612e000a1c281e39e27876d6064dc6634a160ce24870091d384618e7c223b26
|