Skip to main content

Gemini 3 Pro Image MCP server with advanced features: high-resolution output (1K-4K), reference images (up to 14), Google Search grounding, and thinking mode

Project description

Ultimate Gemini MCP Banner

Gemini 3 Pro Image MCP Server 🎨

Professional MCP server exclusively for Google's Gemini 3 Pro Image Preview (aka "Nano Banana Pro") - state-of-the-art image generation with advanced reasoning, high-resolution output (1K-4K), up to 14 reference images, Google Search grounding, and automatic thinking mode.

✨ All generated images include invisible SynthID watermarks for authenticity and provenance tracking.

✨ Features

Gemini 3 Pro Image Capabilities

  • High-Resolution Output: Generate images in 1K, 2K, and 4K resolutions
  • Advanced Text Rendering: Create legible, stylized text in infographics, menus, diagrams, and marketing assets
  • Up to 14 Reference Images: Mix up to 14 reference images (6 objects + 5 humans) for consistent style and characters
  • Google Search Grounding: Use real-time data from Google Search (weather, stocks, events, maps)
  • Thinking Mode: Model uses reasoning process to refine composition before generating final output

Advanced Capabilities

  • 🤖 AI Prompt Enhancement: Automatically optimize prompts using Gemini Flash for superior results
  • 🔍 Google Search Integration: Generate images based on real-time information
  • 🎨 Reference Images: Use up to 14 images for style consistency and character preservation
  • 📐 Flexible Aspect Ratios: Support for 10 aspect ratios (1:1, 16:9, 9:16, 3:2, 4:3, 4:5, 5:4, 2:3, 3:4, 21:9)
  • 💭 Thought Process Visibility: See the model's thinking process (interim images and reasoning)
  • 🚀 Batch Processing: Generate multiple images efficiently with parallel processing
  • 🎯 Dual Modalities: Get both text explanations and images in responses

Production Ready

  • Comprehensive error handling and validation
  • Configurable settings via environment variables
  • Detailed logging and debugging
  • MCP resources for configuration and model information

🎬 Showcase - Gemini 3 Pro Image Features

Gemini 3 Pro Image - Experience state-of-the-art image generation with advanced reasoning and high-resolution output.

Key Features in Action

All images can be generated with 4K resolution and optional AI prompt enhancement.

Example Use Cases

1. High-Resolution Professional Assets

Generate a 4K image of "modern office interior with natural lighting"
- Model: gemini-3-pro-image-preview
- Image Size: 4K
- Aspect Ratio: 16:9

2. Real-Time Data Visualization

Generate an image with Google Search grounding:
"Visualize the current weather forecast for the next 5 days in San Francisco as a clean, modern weather chart. Add a visual on what I should wear each day"
- Enable Google Search: true
- Aspect Ratio: 16:9

3. Reference Image Consistency

Use reference images to maintain consistent characters:
- Provide up to 5 human reference images
- Provide up to 6 object reference images
- Generate "An office group photo of these people, they are making funny faces"

4. Advanced Text Rendering

Generate infographics, menus, or diagrams with legible text:
"Create a restaurant menu with elegant typography showing appetizers, mains, and desserts"
- Image Size: 2K
- Aspect Ratio: 3:4

🔥 Why Gemini 3 Pro Image Is Powerful

  1. State-of-the-Art Quality: Built-in generation capabilities up to 4K resolution
  2. Advanced Reasoning: Thinking mode refines composition before final output
  3. Real-Time Grounding: Google Search integration for accurate, current data
  4. Character Consistency: Use up to 14 reference images for maintaining style
  5. Professional Features: Advanced text rendering for infographics and marketing

🎨 Prompt Enhancement Showcase

See the power of AI prompt enhancement! When enabled, simple prompts can be transformed into detailed, cinematic descriptions:

Original: "A fierce wolf wearing the black symbiote Spider-Man suit, web-slinging through city at night"

Enhanced: "A powerfully built Alaskan Tundra Wolf, snarling fiercely, wearing the matte black, viscous, wet-looking symbiote suit with exaggerated white spider emblem. Captured mid-air in dramatic web-slinging arc with taut glowing webbing. Extreme low-angle perspective, hyper-detailed neo-noir cityscape at midnight with rain-slicked asphalt. High-contrast cinematic lighting with deep shadows and electric neon rim lighting."

Generated Images (2K, 16:9, Prompt Enhancement: ON)

Wolf - Black Symbiote Suit

Wolf in Black Symbiote Suit

Lion - Classic Red & Blue Suit

Lion in Classic Spider-Man Suit

Black Panther - Symbiote Suit

Panther in Symbiote Suit

Eagle - Classic Suit in Flight

Eagle in Spider-Man Suit

Grizzly Bear - Symbiote Suit

Bear in Symbiote Suit

Fox - Classic Suit at Dusk

Fox in Spider-Man Suit

All images generated with enhance_prompt: true, showcasing how simple descriptions can become photorealistic, cinematic masterpieces with dramatic lighting, detailed textures, and professional composition when enhancement is enabled.

📸 Photorealistic Capabilities

Gemini 3 Pro Image excels at creating incredibly realistic images of people in unusual and imaginative scenarios:

Jensen Huang - GPU Surfing

Jensen surfing on GPU through cyberpunk city Riding a giant NVIDIA GPU chip through a neon-lit cyberpunk cityscape

Elon Musk - Mars Chess Match

Elon playing chess with robot on Mars Playing chess with a humanoid robot on the surface of Mars, Earth visible in background

Jensen Huang - GPU Kitchen

Jensen cooking with GPU appliances Cooking breakfast in a high-tech kitchen where all appliances are GPUs with RGB lighting

Elon Musk - Cybertruck Symphony

Elon conducting Cybertruck orchestra Conducting a symphony orchestra made entirely of Tesla Cybertrucks in a concert hall

Jensen Huang - Underwater Data Center

Jensen scuba diving in data center Scuba diving in an underwater data center surrounded by glowing servers and tropical fish

Elon Musk - SpaceX Skateboarding

Elon skateboarding at SpaceX Skateboarding through the SpaceX factory with a Starship rocket in the background

These images demonstrate the model's exceptional ability to:

  • Generate photorealistic human likenesses
  • Blend reality with creative, surreal concepts
  • Maintain accurate lighting and perspective
  • Create detailed, believable environments
  • Handle complex compositions with multiple elements

🚀 Quick Start

Prerequisites

Installation

Option 1: Using uv (Recommended)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install and run the server
uvx ultimate-gemini-mcp

Option 2: Using pip

pip install ultimate-gemini-mcp

Option 3: From Source

git clone <repository-url>
cd ultimate-gemini-mcp
uv sync

Configuration

Create a .env file in your project directory:

cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Or set environment variables directly:

export GEMINI_API_KEY=your_api_key_here

📖 Usage

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Important Notes:

  1. Images are automatically saved to ~/gemini_images (your home directory). You can optionally set OUTPUT_DIR to customize this location:

    • macOS: "OUTPUT_DIR": "/Users/yourusername/custom_folder"
    • Windows: "OUTPUT_DIR": "C:\\Users\\YourUsername\\custom_folder"
  2. uvx path issues on macOS: If you get spawn uvx ENOENT errors, use the full path to uvx:

    "command": "/Users/yourusername/.local/bin/uvx"
    

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

With Claude Code (VS Code)

# Add MCP server to Claude Code
claude mcp add ultimate-gemini \
  --env GEMINI_API_KEY=your-api-key \
  -- uvx ultimate-gemini-mcp

Note: Images are automatically saved to ~/gemini_images. To customize, add --env OUTPUT_DIR=/your/custom/path.

With Cursor

Add to Cursor's MCP configuration (.cursor/mcp.json):

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Note: Images are automatically saved to ~/gemini_images. Optionally add "OUTPUT_DIR": "/your/custom/path" to customize.

🎯 Supported Model

This MCP server exclusively supports:

Gemini 3 Pro Image Preview (gemini-3-pro-image-preview)

The only model supported - Google's state-of-the-art image generation model (aka "Nano Banana Pro") optimized for professional asset production with:

  • Built-in 1K, 2K, and 4K resolution support (must use uppercase 'K')
  • Advanced text rendering capabilities for infographics, menus, diagrams, logos
  • Up to 14 reference images (max 6 objects + max 5 humans) for style/character consistency
  • Google Search grounding for real-time data (weather, stocks, events, maps)
  • Thinking mode with reasoning process (automatic, cannot be disabled)
  • Support for both TEXT and IMAGE response modalities
  • SynthID watermarking automatically applied to all generated images

🛠️ Tools

generate_image

Generate professional images using Gemini 3 Pro Image with advanced features.

Parameters:

  • prompt (required): Text description of the image to generate
    • Best Practice: Use descriptive paragraphs, not keyword lists. "Describe the scene, don't just list keywords"
  • model: Model to use (default: gemini-3-pro-image-preview - the only supported model)
  • enhance_prompt: Automatically enhance prompt using Gemini Flash (default: false)
    • Enable for simple/vague prompts; transforms them into detailed, cinematic descriptions
  • aspect_ratio: Image proportions (default: 1:1)
    • Options: "1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"
  • image_size: Resolution (default: 2K)
    • CRITICAL: Must use uppercase 'K': "1K", "2K", or "4K" (lowercase like "2k" will be rejected!)
    • "1K" - Fast testing (1120 tokens, ~1-2MB)
    • "2K" - Recommended for most use cases (1120 tokens, ~3-5MB)
    • "4K" - Maximum quality for production (2000 tokens, ~8-15MB)
  • output_format: Image file format (default: png)
    • Options: "png" (recommended), "jpeg", "webp"
  • reference_image_paths: List of paths to reference images (up to 14 total)
    • Maximum 6 object images for high-fidelity inclusion of products/items
    • Maximum 5 human images for character/person consistency
    • Use for: character consistency, style transfer, object inclusion, multi-person compositions
  • enable_google_search: Enable Google Search grounding for real-time data (default: false)
    • Use for: current events, weather forecasts, stock data, recent news, real-time maps
    • Adds 1-3 seconds latency and includes grounding_metadata in response
  • response_modalities: Response types (default: ["TEXT", "IMAGE"])
    • Options: ["TEXT", "IMAGE"], ["IMAGE"], ["TEXT"]

Examples:

1. Basic image generation:
   Generate an image of "a serene mountain landscape at sunset with a lake reflection"

2. High-resolution with specific aspect ratio:
   Generate a 4K image of "modern minimalist architecture" with aspect_ratio 16:9

3. With Google Search grounding:
   Generate an image with Google Search enabled: "Current weather map for New York City"

4. With reference images:
   Generate an image with reference_image_paths: ["/path/person1.png", "/path/person2.png"]
   and prompt: "An office group photo of these people making funny faces"

batch_generate

Process multiple prompts efficiently with parallel batch processing.

Parameters:

  • prompts (required): List of text prompts
  • model: Model to use for all images
  • enhance_prompt: Enhance all prompts (default: false) - Enable for simple/vague prompts
  • aspect_ratio: Aspect ratio for all images
  • batch_size: Parallel processing size (default: from config)

Example:

Batch generate images for these prompts:
1. "minimalist logo design for a tech startup"
2. "modern dashboard UI design"
3. "mobile app wireframe"

🎨 Advanced Features

AI Prompt Enhancement

When enabled, the server uses Gemini Flash to automatically enhance your prompts:

Original: a cat wearing a space helmet

Enhanced: A photorealistic portrait of a domestic tabby cat wearing a futuristic space helmet, close-up composition, warm studio lighting, detailed fur texture, reflective helmet visor showing subtle reflections, soft focus background, professional photography style

This significantly improves image quality without requiring you to be a prompt engineering expert!

Google Search Grounding

Generate images based on real-time data:

Generate an image with Google Search enabled:
- prompt: "Visualize the current weather forecast for San Francisco as a modern chart"
- enable_google_search: true

The response will include grounding metadata with search sources used.

Reference Images for Consistency

Maintain consistent characters and objects across generations:

Generate an image with:
- prompt: "An office group photo of these people, they are making funny faces"
- reference_image_paths: ["/path/person1.png", "/path/person2.png", "/path/person3.png"]
- aspect_ratio: "5:4"
- image_size: "2K"

You can provide up to 14 reference images (max 6 objects, max 5 humans).

High-Resolution Assets

Generate professional 4K assets:

Generate a 4K image of "minimalist logo design for a tech startup"
with image_size: "4K" and aspect_ratio: "1:1"

⚙️ Configuration

Environment Variables

Variable Description Default
GEMINI_API_KEY Google Gemini API key (required) -
OUTPUT_DIR Directory for generated images ~/gemini_images
ENABLE_PROMPT_ENHANCEMENT Enable AI prompt enhancement false
ENABLE_BATCH_PROCESSING Enable batch processing true
DEFAULT_MODEL Default model gemini-3-pro-image-preview
DEFAULT_IMAGE_SIZE Default resolution 2K
ENABLE_GOOGLE_SEARCH Enable Google Search grounding false
REQUEST_TIMEOUT API request timeout (seconds) 60
MAX_BATCH_SIZE Maximum parallel batch size 8
LOG_LEVEL Logging level INFO

📚 MCP Resources

models://list

View all available models with descriptions and features.

settings://config

View current server configuration.

🎭 Use Cases

Web Development

  • Hero images and banners
  • UI/UX mockups and wireframes
  • Logo and branding assets
  • Placeholder images

App Development

  • App icons and splash screens
  • User interface elements
  • Marketing materials
  • Documentation images

Content Creation

  • Blog post illustrations
  • Social media graphics
  • Presentation visuals
  • Product mockups

Creative Projects

  • Character design iterations
  • Concept art exploration
  • Style variations
  • Scene composition

📊 Gemini 3 Pro Image Features

Feature Support Details
Resolution Options ✅ 1K, 2K, 4K Built-in high-resolution generation (MUST use uppercase 'K')
Reference Images ✅ Up to 14 6 objects + 5 humans for consistency
Google Search Grounding ✅ Real-time data Weather, stocks, events, maps
Thinking Mode ✅ Advanced reasoning Automatic (cannot be disabled), generates up to 2 interim images
Text Rendering ✅ Advanced Legible text in infographics, menus, diagrams, logos
Aspect Ratios ✅ 10 options Full flexibility for any format
Response Modalities ✅ TEXT + IMAGE Dual output modes
Prompt Enhancement ✅ Built-in AI-powered optimization using Gemini Flash
SynthID Watermarking ✅ Automatic Invisible watermark on all generated images
Thought Signatures ✅ Automatic Preserved across multi-turn interactions (handled by SDK)
Best For Professional assets, marketing, real-time visualization, logos, infographics

🐛 Troubleshooting

"spawn uvx ENOENT" error

  • Cause: Claude Desktop cannot find the uvx command in its PATH
  • Solution: Use the full path to uvx in your config:
    "command": "/Users/yourusername/.local/bin/uvx"
    
  • Find your uvx location with: which uvx

Custom output directory

  • Default: Images are automatically saved to ~/gemini_images in your home directory
  • Customize: Set OUTPUT_DIR in your MCP config if you want a different location:
    "env": {
      "GEMINI_API_KEY": "your-key",
      "OUTPUT_DIR": "/your/custom/path"
    }
    

"GEMINI_API_KEY not found"

  • Add your API key to .env or environment variables
  • Get a free key at Google AI Studio

"Content blocked by safety filters"

  • Modify your prompt to comply with content policies
  • Try rephrasing without potentially sensitive content

"Rate limit exceeded"

  • Wait a few moments and try again
  • Consider upgrading your API plan for higher limits

Images not saving

  • Check that OUTPUT_DIR exists and is writable
  • Verify you have sufficient disk space
  • Create the directory manually: mkdir -p /path/to/your/images

🤝 Contributing

Contributions are welcome! This project combines the best features from multiple MCP servers:

  • mcp-image (TypeScript): Prompt enhancement and editing features
  • nanobanana-mcp-server (Python): Architecture and FastMCP integration
  • gemini-imagen-mcp-server (TypeScript): Imagen API support and batch processing

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built on the excellent work of:

🔗 Links


Ready to create amazing AI-generated images? Install now and start generating! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultimate_gemini_mcp-3.0.17.tar.gz (88.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultimate_gemini_mcp-3.0.17-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file ultimate_gemini_mcp-3.0.17.tar.gz.

File metadata

  • Download URL: ultimate_gemini_mcp-3.0.17.tar.gz
  • Upload date:
  • Size: 88.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ultimate_gemini_mcp-3.0.17.tar.gz
Algorithm Hash digest
SHA256 326c014e85b10ed483ed9c95387160acdc081afa43423d6dc8ff501f1567e3cb
MD5 675a4e6780f192ea1de6c06254a3c41b
BLAKE2b-256 7cd9d0494eaab0cd50172eb1abd0176a313167f1d669e49ef4ca704bc7853f7c

See more details on using hashes here.

File details

Details for the file ultimate_gemini_mcp-3.0.17-py3-none-any.whl.

File metadata

File hashes

Hashes for ultimate_gemini_mcp-3.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 ed6514e10a2991b6ef009bd12771ee023e0233b16c36f979cf91bd36d800f6c2
MD5 1425f75a03fd1cd59d0f6b4d3eeda138
BLAKE2b-256 1abf45619b1a7711af5379ddd2fe1e29a6918529bfb2784c45907c06c68a6e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page