Skip to main content

MCP server exposing Google Gemini media generation APIs as tools for AI agents

Project description

media-mcp

MCP server for AI-powered media generation using Google Gemini. Generate images, videos, music, and speech directly from your AI agent.

Features

  • Image Generation — Create and edit images using Gemini's Nano Banana models with support for multiple aspect ratios, resolutions up to 4K, and reference images
  • Video Generation — Generate videos with native audio, dialogue, and sound effects using Veo models (text-to-video, image-to-video, video extension)
  • Music Generation — Create instrumental music with weighted text prompts using Lyria RealTime (genre, instrument, mood control with BPM and scale)
  • Speech Generation — Convert text to speech with voice selection, multi-speaker support, and natural language style control using Gemini TTS

Installation

Using uvx (recommended)

uvx media-mcp

Using pip

pip install media-mcp

Prerequisites

Configuration

Set your Gemini API key as an environment variable:

export GEMINI_API_KEY="your-gemini-api-key"

Environment Variables

Variable Required Description
GEMINI_API_KEY Yes Google Gemini API key for authentication
MEDIA_OUTPUT_DIR No Directory path for saving generated media files (see below)

Output behavior

When MEDIA_OUTPUT_DIR is set, every generated file is saved to that directory and the tool returns only the file path — no binary data is included in the response. This is the recommended setup because MCP messages are stored in the conversation history, and large base64 payloads pollute context and waste tokens.

When MEDIA_OUTPUT_DIR is not set, the server has no filesystem target, so it returns the raw base64-encoded data directly in the response. This works for quick experiments but is not recommended for production use.

MCP Client Setup

Claude Desktop

Add to your claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "media-mcp": {
      "command": "uvx",
      "args": ["media-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key",
        "MEDIA_OUTPUT_DIR": "/path/to/media/output"
      }
    }
  }
}

Claude Code

claude mcp add media-mcp --transport stdio -- uvx media-mcp

Or add manually to .mcp.json:

{
  "mcpServers": {
    "media-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["media-mcp"],
      "env": {
        "GEMINI_API_KEY": "${GEMINI_API_KEY}",
        "MEDIA_OUTPUT_DIR": "/path/to/media/output"
      }
    }
  }
}

Tools

generate_image

Generate or edit images using Gemini's Nano Banana models.

Parameter Type Required Default Description
prompt string Yes Text description of the image to generate
model enum No nano-banana-2 nano-banana-2, nano-banana-pro, nano-banana
aspect_ratio enum No 1:1 1:1, 9:16, 16:9, 3:2, 4:3, and more
image_size enum No 1K 512px, 1K, 2K, 4K
reference_images list[str] No Base64-encoded reference images
thinking_level enum No minimal minimal, high
use_google_search bool No false Enable Google Search grounding

Example prompt: "A watercolor painting of a cozy cabin in the mountains during autumn"

generate_video

Generate videos with native audio using Veo models.

Parameter Type Required Default Description
prompt string Yes Text description including dialogue, sound effects, camera directions
model enum No veo-3.1 veo-3.1, veo-3
aspect_ratio enum No 16:9 16:9 (landscape), 9:16 (portrait)
resolution enum No 720p, 1080p, 4K
first_frame_image str No Base64-encoded image for first frame
last_frame_image str No Base64-encoded image for last frame
reference_images list[str] No Up to 3 base64-encoded reference images

Example prompt: "A slow dolly shot through a neon-lit alley at night, rain falling, 'Where are you going?' whispered softly, footsteps echoing"

generate_music

Generate instrumental music using Lyria RealTime with weighted prompts.

Parameter Type Required Default Description
prompts list[dict] Yes Weighted prompts, e.g. [{"text": "minimal techno", "weight": 1.0}]
bpm int No Tempo in beats per minute
temperature float No 1.0 Randomness/creativity control
scale str No Musical scale constraint (e.g. C_MAJOR_A_MINOR)
duration_seconds int No 30 Duration of the output clip

Example prompts: [{"text": "Piano", "weight": 2.0}, {"text": "Meditation", "weight": 0.5}]

generate_speech

Convert text to speech with voice and style control.

Parameter Type Required Default Description
text string Yes Text to speak. For multi-speaker, format as dialogue with speaker names.
model enum No flash-tts flash-tts, pro-tts
voice_name str No Voice name: Kore, Puck, Charon, Fenrir, Aoede, Leda, Orus, Zephyr
multi_speaker bool No false Enable multi-speaker mode
speakers list[dict] No Speaker-to-voice mapping, e.g. [{"name": "Alice", "voice_name": "Kore"}]
style_instructions str No Style guidance, e.g. "Read in a calm, slow pace"

Example: Text: "Welcome to the show!" with voice_name: "Kore" and style_instructions: "Say cheerfully"

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

Set the environment variable before starting the server:

export GEMINI_API_KEY="your-key-here"

When using Claude Desktop or Claude Code, pass the key via the env block in your MCP configuration (see MCP Client Setup).

"Authentication failed" or 401 errors

Your API key may be invalid or expired. Verify it at Google AI Studio.

"Rate limit or quota exceeded" or 429 errors

Wait a moment and retry. Check your API quota at Google AI Studio.

"Content blocked by safety filter"

Modify your prompt to avoid restricted content. The Gemini API applies safety filters to all generated media.

Python version errors

media-mcp requires Python 3.10 or later. Check your version:

python --version

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

media_mcp-0.2.2.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

media_mcp-0.2.2-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file media_mcp-0.2.2.tar.gz.

File metadata

  • Download URL: media_mcp-0.2.2.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for media_mcp-0.2.2.tar.gz
Algorithm Hash digest
SHA256 813d9684818457a35bd5c55a9bc57d9ef6a4129063254dc64d56117b8ae7863d
MD5 b5142a6d0bb6d13a1b6c586e7dee68e0
BLAKE2b-256 ac64694ced88bd4aa691a3c6cdb2a5311725378abbbc0c89b952bac6acb0673b

See more details on using hashes here.

File details

Details for the file media_mcp-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: media_mcp-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for media_mcp-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9b01613ad36f542256351276a9b3de87d52fc02ce767a08a9792477e9d96ac4
MD5 11cfe9a22182f2590b22580326e0832b
BLAKE2b-256 9750fbf0548fec2f301748af9c3782f2275f3cb2a4e8cc90abf22cec206b95b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page