Skip to main content

Gemini Omni Flash MCP server for text-to-video, image-to-video, reference-guided video, and conversational video editing

Project description

Gemini Omni MCP Banner

Gemini Omni MCP

MCP server for Google's Gemini Omni Flash video model — text-to-video, image-to-video, reference-guided video, and conversational video editing with native audio, straight from your AI agent.

PyPI version Python 3.11+ License: MIT


Setup

Get a Gemini API key from Google AI Studio, then add the server to your MCP config.

Claude Desktop / Claude Code / Cursor

Add to your MCP config (mcp.json / .claude.json / claude_desktop_config.json):

{
  "mcpServers": {
    "gemini-omni": {
      "command": "uvx",
      "args": ["gemini-omni-mcp@latest"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Droid CLI

droid mcp add gemini-omni "uvx gemini-omni-mcp@latest" --env GEMINI_API_KEY=your-api-key-here

Generated MP4s are saved to ~/gemini_omni_videos by default (set OUTPUT_DIR to change).


Features

  • Text-to-video: prompt-only MP4 generation with generated audio (music, ambience, SFX)
  • Image-to-video: animate a single reference image with motion and camera direction
  • Reference-to-video: up to 6 reference images to lock subjects, style, or props
  • Conversational editing: iterate on a generated video via previous_interaction_id, or upload your own MP4 and edit it
  • Prompt role tags: <FIRST_FRAME> and <IMAGE_REF_N> bind reference images to roles
  • Timing cues: [0-3s], [3-6s], [6-10s] direct the action beat by beat
  • Batch generation: run multiple prompts in conservative parallel batches (max 4)
  • URI or inline delivery: robust Files API polling and download built in

Output is 720p 24fps MP4 with SynthID watermarking (preview-quality model).


Showcase

All videos below were generated by this server with gemini-omni-flash-preview, sound on.

Corgi on a hoverboard

A corgi wearing tiny goggles rides a glowing hoverboard through a neon-lit Tokyo street at night, rain reflections on the pavement, camera tracking alongside, single continuous shot, cinematic lighting, upbeat synthwave music.

https://github.com/user-attachments/assets/4b9a8e87-7db0-469d-a261-3c354f7fe9b8

Astronaut latte art

An astronaut in a white spacesuit pours latte art into a floating cup inside a cozy moon-base cafe, Earth visible through a large window, steam swirling in low gravity, slow dolly-in, warm lighting, gentle ambient cafe sounds.

https://github.com/user-attachments/assets/88072ef8-1ce4-4c80-a05a-bfb5531d1271

Origami ocean

An origami paper whale swims gracefully through a stylized paper-craft ocean, paper waves folding and unfolding, paper seagulls gliding above, soft sunlight, camera slowly orbiting, calm orchestral score.

https://github.com/user-attachments/assets/c1e31341-e53c-4a55-af12-d6bb9432dcf5


Tools

generate_video

Generates or edits one MP4 and returns JSON with video.path, interaction_id, and metadata.

Argument Type Description
prompt string Scene, motion, camera, lighting, mood, and audio direction
task string? text_to_video, image_to_video, reference_to_video, or edit. Inferred if omitted
aspect_ratio string? 16:9 (default) or 9:16
duration_seconds int? Optional preview field, 3 to 10
reference_image_paths list? Up to 6 local image paths
input_video_path string? Local MP4 to upload and edit
delivery string? uri (default, recommended) or inline
previous_interaction_id string? Continue editing a generated video
enhance_prompt bool? Optional LLM prompt enhancement, default false

batch_generate

Runs multiple prompts in parallel batches, capped at 4.

Argument Type Description
prompts list One prompt per video
task, aspect_ratio, duration_seconds, reference_image_paths, delivery, enhance_prompt Shared across the batch, same semantics as generate_video
batch_size int? Parallelism, capped at MAX_BATCH_SIZE

Configuration

Everything is configured through environment variables (or a local .env):

Variable Default Description
GEMINI_API_KEY Required. GOOGLE_API_KEY also accepted
OUTPUT_DIR ~/gemini_omni_videos Where generated MP4s are saved
DEFAULT_ASPECT_RATIO 16:9 16:9 or 9:16
DEFAULT_DELIVERY uri uri or inline
DEFAULT_DURATION_SECONDS unset Optional 3-10s target
REQUEST_TIMEOUT 300 Generation timeout in seconds
FILE_POLL_INTERVAL 5.0 Seconds between Files API polls
FILE_POLL_TIMEOUT 600 Max seconds waiting for file activation
MAX_BATCH_SIZE 4 Max parallel generations
ENABLE_PROMPT_ENHANCEMENT false LLM-enhance prompts before generation
LOG_LEVEL INFO Logging level

Prompting tips

  • Ask for a "single continuous shot" and "no scene cuts" for one-scene outputs.
  • Always include audio direction, for example "gentle ambient sound, no dialogue".
  • For edits, keep the prompt short and add "Keep everything else the same".
  • Use <FIRST_FRAME> and <IMAGE_REF_N> tags to bind reference-image roles.
  • Timing cues like [0-3s], [3-6s], and [6-10s] work well.

Limitations

  • Preview model: 720p, 24fps, MP4 only, SynthID-watermarked.
  • System instructions, temperature, negative prompts, voice edits, YouTube sources, and multi-video reasoning are unsupported.
  • Uploaded-video editing is unavailable in some regions.

Development

git clone https://github.com/nikships/gemini-omni-mcp
cd gemini-omni-mcp
uv sync --all-extras
uv run ruff format .
uv run ruff check .
uv run mypy gemini_omni_mcp/
uv run pytest
uv build

Releases are automated: every push to main bumps the version and publishes to PyPI (see PUBLISHING.md).

License

Gemini Omni MCP is licensed under the MIT license. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_omni_mcp-1.0.2.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_omni_mcp-1.0.2-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file gemini_omni_mcp-1.0.2.tar.gz.

File metadata

  • Download URL: gemini_omni_mcp-1.0.2.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gemini_omni_mcp-1.0.2.tar.gz
Algorithm Hash digest
SHA256 893c0cf4f25b843758c2b5288aba8f52db9d18da77bef8f4964bd521901e5d7b
MD5 b8c03dfb1428b88a589c2ab387206cf1
BLAKE2b-256 2f012cf531f6cb21e3484bd3735df88dab97f3c01e0c6ba4da82ed0fc8280b8d

See more details on using hashes here.

File details

Details for the file gemini_omni_mcp-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for gemini_omni_mcp-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9d7aad194e518abff1cf960ed51501e965856d3d02760d9c15ed144a16bf4d31
MD5 ae1f43a35980cfda2f04ba46d15dd67c
BLAKE2b-256 853e0b6fe3f9efe6dd1e50d5390b5bc973756f85032a8871728ca1027f7351ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page