MCP server exposing Google Gemini media generation APIs as tools for AI agents

These details have not been verified by PyPI

Project links

Project description

media-mcp

MCP server for AI-powered media generation using Google Gemini. Generate images, videos, music, and speech directly from your AI agent.

Features

Image Generation — Create and edit images using Gemini's Nano Banana models with support for multiple aspect ratios, resolutions up to 4K, and reference images
Video Generation — Generate videos with native audio, dialogue, and sound effects using Veo models (text-to-video, image-to-video, video extension)
Music Generation — Create instrumental music with weighted text prompts using Lyria RealTime (genre, instrument, mood control with BPM and scale)
Speech Generation — Convert text to speech with voice selection, multi-speaker support, and natural language style control using Gemini TTS

Installation

Using uvx (recommended)

uvx media-mcp

Using pip

pip install media-mcp

Prerequisites

Python 3.10+
A Google Gemini API key

Configuration

Set your Gemini API key as an environment variable:

export GEMINI_API_KEY="your-gemini-api-key"

Environment Variables

Variable	Required	Description
`GEMINI_API_KEY`	Yes	Google Gemini API key for authentication
`MEDIA_OUTPUT_DIR`	No	Directory path for saving generated media files (see below)

Output behavior

When MEDIA_OUTPUT_DIR is set, every generated file is saved to that directory and the tool returns only the file path — no binary data is included in the response. This is the recommended setup because MCP messages are stored in the conversation history, and large base64 payloads pollute context and waste tokens.

When MEDIA_OUTPUT_DIR is not set, the server has no filesystem target, so it returns the raw base64-encoded data directly in the response. This works for quick experiments but is not recommended for production use.

MCP Client Setup

Claude Desktop

Add to your claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "media-mcp": {
      "command": "uvx",
      "args": ["media-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key",
        "MEDIA_OUTPUT_DIR": "/path/to/media/output"
      }
    }
  }
}

Claude Code

claude mcp add media-mcp --transport stdio -- uvx media-mcp

Or add manually to .mcp.json:

{
  "mcpServers": {
    "media-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["media-mcp"],
      "env": {
        "GEMINI_API_KEY": "${GEMINI_API_KEY}",
        "MEDIA_OUTPUT_DIR": "/path/to/media/output"
      }
    }
  }
}

Tools

`generate_image`

Generate or edit images using Gemini's Nano Banana models.

Parameter	Type	Required	Default	Description
`prompt`	string	Yes	—	Text description of the image to generate
`model`	enum	No	`nano-banana-2`	`nano-banana-2`, `nano-banana-pro`, `nano-banana`
`aspect_ratio`	enum	No	`1:1`	`1:1`, `9:16`, `16:9`, `3:2`, `4:3`, and more
`image_size`	enum	No	`1K`	`512px`, `1K`, `2K`, `4K`
`reference_images`	list[str]	No	—	Base64-encoded reference images
`thinking_level`	enum	No	`minimal`	`minimal`, `high`
`use_google_search`	bool	No	`false`	Enable Google Search grounding

Example prompt: "A watercolor painting of a cozy cabin in the mountains during autumn"

`generate_video`

Generate videos with native audio using Veo models.

Parameter	Type	Required	Default	Description
`prompt`	string	Yes	—	Text description including dialogue, sound effects, camera directions
`model`	enum	No	`veo-3.1`	`veo-3.1`, `veo-3`
`aspect_ratio`	enum	No	`16:9`	`16:9` (landscape), `9:16` (portrait)
`resolution`	enum	No	—	`720p`, `1080p`, `4K`
`first_frame_image`	str	No	—	Base64-encoded image for first frame
`last_frame_image`	str	No	—	Base64-encoded image for last frame
`reference_images`	list[str]	No	—	Up to 3 base64-encoded reference images

Example prompt: "A slow dolly shot through a neon-lit alley at night, rain falling, 'Where are you going?' whispered softly, footsteps echoing"

`generate_music`

Generate instrumental music using Lyria RealTime with weighted prompts.

Parameter	Type	Required	Default	Description
`prompts`	list[dict]	Yes	—	Weighted prompts, e.g. `[{"text": "minimal techno", "weight": 1.0}]`
`bpm`	int	No	—	Tempo in beats per minute
`temperature`	float	No	`1.0`	Randomness/creativity control
`scale`	str	No	—	Musical scale constraint (e.g. `C_MAJOR_A_MINOR`)
`duration_seconds`	int	No	`30`	Duration of the output clip

Example prompts: [{"text": "Piano", "weight": 2.0}, {"text": "Meditation", "weight": 0.5}]

`generate_speech`

Convert text to speech with voice and style control.

Parameter	Type	Required	Default	Description
`text`	string	Yes	—	Text to speak. For multi-speaker, format as dialogue with speaker names.
`model`	enum	No	`flash-tts`	`flash-tts`, `pro-tts`
`voice_name`	str	No	—	Voice name: `Kore`, `Puck`, `Charon`, `Fenrir`, `Aoede`, `Leda`, `Orus`, `Zephyr`
`multi_speaker`	bool	No	`false`	Enable multi-speaker mode
`speakers`	list[dict]	No	—	Speaker-to-voice mapping, e.g. `[{"name": "Alice", "voice_name": "Kore"}]`
`style_instructions`	str	No	—	Style guidance, e.g. "Read in a calm, slow pace"

Example: Text: "Welcome to the show!" with voice_name: "Kore" and style_instructions: "Say cheerfully"

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

Set the environment variable before starting the server:

export GEMINI_API_KEY="your-key-here"

When using Claude Desktop or Claude Code, pass the key via the env block in your MCP configuration (see MCP Client Setup).

"Authentication failed" or 401 errors

Your API key may be invalid or expired. Verify it at Google AI Studio.

"Rate limit or quota exceeded" or 429 errors

Wait a moment and retry. Check your API quota at Google AI Studio.

"Content blocked by safety filter"

Modify your prompt to avoid restricted content. The Gemini API applies safety filters to all generated media.

Python version errors

media-mcp requires Python 3.10 or later. Check your version:

python --version

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

Mar 13, 2026

0.2.1

Mar 13, 2026

0.2.0

Mar 13, 2026

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

media_mcp-0.2.2.tar.gz (15.8 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

media_mcp-0.2.2-py3-none-any.whl (13.9 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file media_mcp-0.2.2.tar.gz.

File metadata

Download URL: media_mcp-0.2.2.tar.gz
Upload date: Mar 13, 2026
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for media_mcp-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`813d9684818457a35bd5c55a9bc57d9ef6a4129063254dc64d56117b8ae7863d`
MD5	`b5142a6d0bb6d13a1b6c586e7dee68e0`
BLAKE2b-256	`ac64694ced88bd4aa691a3c6cdb2a5311725378abbbc0c89b952bac6acb0673b`

See more details on using hashes here.

File details

Details for the file media_mcp-0.2.2-py3-none-any.whl.

File metadata

Download URL: media_mcp-0.2.2-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for media_mcp-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9b01613ad36f542256351276a9b3de87d52fc02ce767a08a9792477e9d96ac4`
MD5	`11cfe9a22182f2590b22580326e0832b`
BLAKE2b-256	`9750fbf0548fec2f301748af9c3782f2275f3cb2a4e8cc90abf22cec206b95b3`

See more details on using hashes here.

media-mcp 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

media-mcp

Features

Installation

Using uvx (recommended)

Using pip

Prerequisites

Configuration

Environment Variables

Output behavior

MCP Client Setup

Claude Desktop

Claude Code

Tools

generate_image

generate_video

generate_music

generate_speech

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

"Authentication failed" or 401 errors

"Rate limit or quota exceeded" or 429 errors

"Content blocked by safety filter"

Python version errors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`generate_image`

`generate_video`

`generate_music`

`generate_speech`