Minimal Gemini Live AI WebSocket proxy for FastAPI — plug and play voice + video AI.
Project description
gemilive
Plug-and-play Gemini Live AI (voice + video) for your FastAPI app.
gemilive provides a seamless bridge between a web-based frontend and Google's Gemini Multimodal Live API. It handles the heavy lifting of WebSockets, bidirectional audio streams (16kHz up / 24kHz down), gapless browser PCM playback, and live video framing — allowing you to add conversational AI to your project in just six lines of code.
This repo contains both the Python backend plugin (gemilive) and the companion JavaScript client (gemilive-js).
🚀 Features
- Bidirectional Voice AI: Real-time PCM audio streaming for natural, fluid conversions. No laggy turn-by-turn.
- Multimodal Vision: The AI can see what your camera sees via 1fps JPEG snapshots.
- Zero-Boilerplate Backend: Just wrap your existing FastAPI app with
mount_gemilive(). - Lightweight JS SDK: A clean browser
GemiliveClienthandling media capture and resampling. - Toggleable Media: Turn your camera off/on mid-session seamlessly.
🛠️ Installation & Quickstart
Integration requires two pieces: the Python server endpoint and the JavaScript browser client.
Backend (Python)
Install the pip package:
uv add gemilive
# or pip install gemilive
Setup requires an API key. You can provide it in code or grab it from your .env:
GOOGLE_API_KEY=your_gemini_api_key_here
MODEL_NAME=gemini-3.1-flash-live-preview
Mount it into any FastAPI app:
from fastapi import FastAPI
from gemilive import mount_gemilive
app = FastAPI()
# Mounts the WebSocket route at /ws/live
mount_gemilive(app, system_prompt="You are a helpful assistant. Keep answers brief.")
Frontend (JavaScript)
Install the npm package:
npm install gemilive-js
Or use via CDN in plain HTML:
<script src="https://cdn.jsdelivr.net/npm/gemilive-js/dist/gemilive.min.js"></script>
Initialize the client, connect, and start talking:
import { GemiliveClient } from 'gemilive-js';
// Point it to your FastAPI server's mount path
const client = new GemiliveClient("ws://localhost:8000/ws/live");
client.onMessage = (text) => console.log("Gemini:", text);
client.onError = (err) => console.error("Error:", err);
// Start the connection (prompts user for Mic & Camera)
await client.start();
// Disable video mid-session (audio continues)
// client.toggleVideo(false);
// Stop and disconnect
// client.stop();
⚙️ Advanced Configuration
Python mount_gemilive() Overrides
You can override environment variables dynamically when mounting the API:
mount_gemilive(
app,
google_api_key="...", # Overrides GOOGLE_API_KEY env
model="gemini-3.1-flash-live-preview",# Overrides MODEL_NAME env
voice="Aoede", # Optional Gemini Voice ("Aoede", "Charon", etc.)
allow_origins=["https://myapp.com"], # Essential if your frontend is on a different domain
debug_mode=True # Console logging of message flow
)
The System Prompt
You can set system prompts on the server-side (via mount_gemilive) or the client-side (via new GemiliveClient(url, { systemPrompt: "..." })).
If both are provided, the server-side prompt takes precedence, and the client-side prompt is appended securely as "Additional context".
📂 Project Structure (For Contributors)
gemilive is developed as a monorepo containing two packages:
├── gemilive/ # PyPI package source
│ ├── mount.py # Public FastAPI installer
│ ├── config.py # Pydantic env validation
│ └── router.py # Internal WebSocket / GenAI flow
├── gemilive-js/ # npm package source
│ ├── src/index.js # Browser SDK (Web Audio API logic)
│ └── package.json
└── main.py # Sandbox FastAPI app for testing and local dev
For guidelines on local development and how to publish to PyPI and npm, read PUBLISHING.md.
⚠️ Important Considerations
- Browser Security: Browsers restrict microphone/camera access to secure contexts.
getUserMediarequires HTTPS in production.localhostworks for development. - Audio Resampling: Browsers typically record audio at 44.1kHz or 48kHz. The
gemilive-jsSDK seamlessly resamples microphone inputs to 16kHz PCM to meet Gemini's strict API requirements. Responses from Gemini are returned as 24kHz PCM and gaplessly played back using Javascript time-scheduling.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemilive-0.1.0.tar.gz.
File metadata
- Download URL: gemilive-0.1.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b0eebb10c7aa55e1db56beee1a748ef6926c05635ddf31d07fc8c040f20fff7
|
|
| MD5 |
8a3f46048b47fe7381dfbbf8b91e9873
|
|
| BLAKE2b-256 |
679df7d2b3b9ae9dfc3e61042322f327c9d1e924a339017bce634f895bd443fb
|
File details
Details for the file gemilive-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gemilive-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fefbeff0889aaab2235826cacaab7965de92b89ac04015dd416cb99081d6a62e
|
|
| MD5 |
b54e361062b167ae8ee74ab6a9a31915
|
|
| BLAKE2b-256 |
fe60e3bfe55975959c41f46132df6d2d98f2c915e46305ea08c58b946023f226
|