Gemini Omni Flash MCP server for text-to-video, image-to-video, reference-guided video, and conversational video editing
Project description
Gemini Omni MCP
MCP server for Google's Gemini Omni Flash video model — text-to-video, image-to-video, reference-guided video, and conversational video editing with native audio, straight from your AI agent.
Setup
Get a Gemini API key from Google AI Studio, then add the server to your MCP config.
Claude Desktop / Claude Code / Cursor
Add to your MCP config (mcp.json / .claude.json / claude_desktop_config.json):
{
"mcpServers": {
"gemini-omni": {
"command": "uvx",
"args": ["gemini-omni-mcp@latest"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
Droid CLI
droid mcp add gemini-omni "uvx gemini-omni-mcp@latest" --env GEMINI_API_KEY=your-api-key-here
Generated MP4s are saved to ~/gemini_omni_videos by default (set OUTPUT_DIR to change).
Features
- Text-to-video: prompt-only MP4 generation with generated audio (music, ambience, SFX)
- Image-to-video: animate a single reference image with motion and camera direction
- Reference-to-video: up to 6 reference images to lock subjects, style, or props
- Conversational editing: iterate on a generated video via
previous_interaction_id, or upload your own MP4 and edit it - Prompt role tags:
<FIRST_FRAME>and<IMAGE_REF_N>bind reference images to roles - Timing cues:
[0-3s],[3-6s],[6-10s]direct the action beat by beat - Batch generation: run multiple prompts in conservative parallel batches (max 4)
- URI or inline delivery: robust Files API polling and download built in
Output is 720p 24fps MP4 with SynthID watermarking (preview-quality model).
Showcase
All videos below were generated by this server with gemini-omni-flash-preview, sound on.
Corgi on a hoverboard
A corgi wearing tiny goggles rides a glowing hoverboard through a neon-lit Tokyo street at night, rain reflections on the pavement, camera tracking alongside, single continuous shot, cinematic lighting, upbeat synthwave music.
https://github.com/user-attachments/assets/4b9a8e87-7db0-469d-a261-3c354f7fe9b8
Astronaut latte art
An astronaut in a white spacesuit pours latte art into a floating cup inside a cozy moon-base cafe, Earth visible through a large window, steam swirling in low gravity, slow dolly-in, warm lighting, gentle ambient cafe sounds.
https://github.com/user-attachments/assets/88072ef8-1ce4-4c80-a05a-bfb5531d1271
Origami ocean
An origami paper whale swims gracefully through a stylized paper-craft ocean, paper waves folding and unfolding, paper seagulls gliding above, soft sunlight, camera slowly orbiting, calm orchestral score.
https://github.com/user-attachments/assets/c1e31341-e53c-4a55-af12-d6bb9432dcf5
Tools
generate_video
Generates or edits one MP4 and returns JSON with video.path, interaction_id, and metadata.
| Argument | Type | Description |
|---|---|---|
prompt |
string | Scene, motion, camera, lighting, mood, and audio direction |
task |
string? | text_to_video, image_to_video, reference_to_video, or edit. Inferred if omitted |
aspect_ratio |
string? | 16:9 (default) or 9:16 |
duration_seconds |
int? | Optional preview field, 3 to 10 |
reference_image_paths |
list? | Up to 6 local image paths |
input_video_path |
string? | Local MP4 to upload and edit |
delivery |
string? | uri (default, recommended) or inline |
previous_interaction_id |
string? | Continue editing a generated video |
enhance_prompt |
bool? | Optional LLM prompt enhancement, default false |
batch_generate
Runs multiple prompts in parallel batches, capped at 4.
| Argument | Type | Description |
|---|---|---|
prompts |
list | One prompt per video |
task, aspect_ratio, duration_seconds, reference_image_paths, delivery, enhance_prompt |
— | Shared across the batch, same semantics as generate_video |
batch_size |
int? | Parallelism, capped at MAX_BATCH_SIZE |
Configuration
Everything is configured through environment variables (or a local .env):
| Variable | Default | Description |
|---|---|---|
GEMINI_API_KEY |
— | Required. GOOGLE_API_KEY also accepted |
OUTPUT_DIR |
~/gemini_omni_videos |
Where generated MP4s are saved |
DEFAULT_ASPECT_RATIO |
16:9 |
16:9 or 9:16 |
DEFAULT_DELIVERY |
uri |
uri or inline |
DEFAULT_DURATION_SECONDS |
unset | Optional 3-10s target |
REQUEST_TIMEOUT |
300 |
Generation timeout in seconds |
FILE_POLL_INTERVAL |
5.0 |
Seconds between Files API polls |
FILE_POLL_TIMEOUT |
600 |
Max seconds waiting for file activation |
MAX_BATCH_SIZE |
4 |
Max parallel generations |
ENABLE_PROMPT_ENHANCEMENT |
false |
LLM-enhance prompts before generation |
LOG_LEVEL |
INFO |
Logging level |
Prompting tips
- Ask for a "single continuous shot" and "no scene cuts" for one-scene outputs.
- Always include audio direction, for example "gentle ambient sound, no dialogue".
- For edits, keep the prompt short and add "Keep everything else the same".
- Use
<FIRST_FRAME>and<IMAGE_REF_N>tags to bind reference-image roles. - Timing cues like
[0-3s],[3-6s], and[6-10s]work well.
Limitations
- Preview model: 720p, 24fps, MP4 only, SynthID-watermarked.
- System instructions, temperature, negative prompts, voice edits, YouTube sources, and multi-video reasoning are unsupported.
- Uploaded-video editing is unavailable in some regions.
Development
git clone https://github.com/nikships/gemini-omni-mcp
cd gemini-omni-mcp
uv sync --all-extras
uv run ruff format .
uv run ruff check .
uv run mypy gemini_omni_mcp/
uv run pytest
uv build
Releases are automated: every push to main bumps the version and publishes to PyPI (see PUBLISHING.md).
License
Gemini Omni MCP is licensed under the MIT license. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_omni_mcp-1.0.2.tar.gz.
File metadata
- Download URL: gemini_omni_mcp-1.0.2.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
893c0cf4f25b843758c2b5288aba8f52db9d18da77bef8f4964bd521901e5d7b
|
|
| MD5 |
b8c03dfb1428b88a589c2ab387206cf1
|
|
| BLAKE2b-256 |
2f012cf531f6cb21e3484bd3735df88dab97f3c01e0c6ba4da82ed0fc8280b8d
|
File details
Details for the file gemini_omni_mcp-1.0.2-py3-none-any.whl.
File metadata
- Download URL: gemini_omni_mcp-1.0.2-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d7aad194e518abff1cf960ed51501e965856d3d02760d9c15ed144a16bf4d31
|
|
| MD5 |
ae1f43a35980cfda2f04ba46d15dd67c
|
|
| BLAKE2b-256 |
853e0b6fe3f9efe6dd1e50d5390b5bc973756f85032a8871728ca1027f7351ac
|