MCP server exposing Google Gemini media generation APIs as tools for AI agents
Project description
media-mcp
MCP server for AI-powered media generation using Google Gemini. Generate images, videos, music, and speech directly from your AI agent.
Features
- Image Generation — Create and edit images using Gemini's Nano Banana models with support for multiple aspect ratios, resolutions up to 4K, and reference images
- Video Generation — Generate videos with native audio, dialogue, and sound effects using Veo models (text-to-video, image-to-video, video extension)
- Music Generation — Create instrumental music with weighted text prompts using Lyria RealTime (genre, instrument, mood control with BPM and scale)
- Speech Generation — Convert text to speech with voice selection, multi-speaker support, and natural language style control using Gemini TTS
Installation
Using uvx (recommended)
uvx media-mcp
Using pip
pip install media-mcp
Prerequisites
- Python 3.10+
- A Google Gemini API key
Configuration
Set your Gemini API key as an environment variable:
export GEMINI_API_KEY="your-gemini-api-key"
Environment Variables
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
Yes | Google Gemini API key for authentication |
MEDIA_OUTPUT_DIR |
No | Directory path for saving generated media files (see below) |
Output behavior
When MEDIA_OUTPUT_DIR is set, every generated file is saved to that directory and the tool returns only the file path — no binary data is included in the response. This is the recommended setup because MCP messages are stored in the conversation history, and large base64 payloads pollute context and waste tokens.
When MEDIA_OUTPUT_DIR is not set, the server has no filesystem target, so it returns the raw base64-encoded data directly in the response. This works for quick experiments but is not recommended for production use.
MCP Client Setup
Claude Desktop
Add to your claude_desktop_config.json:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"media-mcp": {
"command": "uvx",
"args": ["media-mcp"],
"env": {
"GEMINI_API_KEY": "your-gemini-api-key",
"MEDIA_OUTPUT_DIR": "/path/to/media/output"
}
}
}
}
Claude Code
claude mcp add media-mcp --transport stdio -- uvx media-mcp
Or add manually to .mcp.json:
{
"mcpServers": {
"media-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["media-mcp"],
"env": {
"GEMINI_API_KEY": "${GEMINI_API_KEY}",
"MEDIA_OUTPUT_DIR": "/path/to/media/output"
}
}
}
}
Tools
generate_image
Generate or edit images using Gemini's Nano Banana models.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt |
string | Yes | — | Text description of the image to generate |
model |
enum | No | nano-banana-2 |
nano-banana-2, nano-banana-pro, nano-banana |
aspect_ratio |
enum | No | 1:1 |
1:1, 9:16, 16:9, 3:2, 4:3, and more |
image_size |
enum | No | 1K |
512px, 1K, 2K, 4K |
reference_images |
list[str] | No | — | Base64-encoded reference images |
thinking_level |
enum | No | minimal |
minimal, high |
use_google_search |
bool | No | false |
Enable Google Search grounding |
Example prompt: "A watercolor painting of a cozy cabin in the mountains during autumn"
generate_video
Generate videos with native audio using Veo models.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt |
string | Yes | — | Text description including dialogue, sound effects, camera directions |
model |
enum | No | veo-3.1 |
veo-3.1, veo-3 |
aspect_ratio |
enum | No | 16:9 |
16:9 (landscape), 9:16 (portrait) |
resolution |
enum | No | — | 720p, 1080p, 4K |
first_frame_image |
str | No | — | Base64-encoded image for first frame |
last_frame_image |
str | No | — | Base64-encoded image for last frame |
reference_images |
list[str] | No | — | Up to 3 base64-encoded reference images |
Example prompt: "A slow dolly shot through a neon-lit alley at night, rain falling, 'Where are you going?' whispered softly, footsteps echoing"
generate_music
Generate instrumental music using Lyria RealTime with weighted prompts.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompts |
list[dict] | Yes | — | Weighted prompts, e.g. [{"text": "minimal techno", "weight": 1.0}] |
bpm |
int | No | — | Tempo in beats per minute |
temperature |
float | No | 1.0 |
Randomness/creativity control |
scale |
str | No | — | Musical scale constraint (e.g. C_MAJOR_A_MINOR) |
duration_seconds |
int | No | 30 |
Duration of the output clip |
Example prompts: [{"text": "Piano", "weight": 2.0}, {"text": "Meditation", "weight": 0.5}]
generate_speech
Convert text to speech with voice and style control.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | — | Text to speak. For multi-speaker, format as dialogue with speaker names. |
model |
enum | No | flash-tts |
flash-tts, pro-tts |
voice_name |
str | No | — | Voice name: Kore, Puck, Charon, Fenrir, Aoede, Leda, Orus, Zephyr |
multi_speaker |
bool | No | false |
Enable multi-speaker mode |
speakers |
list[dict] | No | — | Speaker-to-voice mapping, e.g. [{"name": "Alice", "voice_name": "Kore"}] |
style_instructions |
str | No | — | Style guidance, e.g. "Read in a calm, slow pace" |
Example: Text: "Welcome to the show!" with voice_name: "Kore" and style_instructions: "Say cheerfully"
Troubleshooting
"GEMINI_API_KEY environment variable is not set"
Set the environment variable before starting the server:
export GEMINI_API_KEY="your-key-here"
When using Claude Desktop or Claude Code, pass the key via the env block in your MCP configuration (see MCP Client Setup).
"Authentication failed" or 401 errors
Your API key may be invalid or expired. Verify it at Google AI Studio.
"Rate limit or quota exceeded" or 429 errors
Wait a moment and retry. Check your API quota at Google AI Studio.
"Content blocked by safety filter"
Modify your prompt to avoid restricted content. The Gemini API applies safety filters to all generated media.
Python version errors
media-mcp requires Python 3.10 or later. Check your version:
python --version
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file media_mcp-0.2.2.tar.gz.
File metadata
- Download URL: media_mcp-0.2.2.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
813d9684818457a35bd5c55a9bc57d9ef6a4129063254dc64d56117b8ae7863d
|
|
| MD5 |
b5142a6d0bb6d13a1b6c586e7dee68e0
|
|
| BLAKE2b-256 |
ac64694ced88bd4aa691a3c6cdb2a5311725378abbbc0c89b952bac6acb0673b
|
File details
Details for the file media_mcp-0.2.2-py3-none-any.whl.
File metadata
- Download URL: media_mcp-0.2.2-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.04","id":"plucky","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9b01613ad36f542256351276a9b3de87d52fc02ce767a08a9792477e9d96ac4
|
|
| MD5 |
11cfe9a22182f2590b22580326e0832b
|
|
| BLAKE2b-256 |
9750fbf0548fec2f301748af9c3782f2275f3cb2a4e8cc90abf22cec206b95b3
|