Skip to main content

Unified MCP server for OpenAI multimodal APIs (Sora, Whisper, GPT Vision)

Project description

sanzaru

sanzaru logo

PyPI version Python versions License CI PyPI downloads

A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, GPT-4o Audio, and TTS APIs via the OpenAI Python SDK.

Features

Video Generation (Sora)

  • Create videos with sora-2 or sora-2-pro models
  • Use reference images to guide generation
  • Remix and refine existing videos
  • Download variants (video, thumbnail, spritesheet)

Image Generation

  • Generate images with gpt-image-2 (recommended), gpt-image-1.5, or GPT-5
  • Edit and compose images with up to 16 inputs
  • Iterative refinement via Responses API
  • Automatic resizing for Sora compatibility

Audio Processing

  • Transcription: Whisper and GPT-4o models
  • Audio Chat: Interactive analysis with GPT-4o
  • Text-to-Speech: Multi-voice TTS generation
  • Processing: Format conversion, compression, file management

Podcast Generation

  • Multi-voice podcasts with up to 4 speakers and 10 TTS voices
  • Parallel segment generation with configurable pacing
  • MP3/WAV output with loudness normalization

Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.

Requirements

  • Python 3.10+
  • OPENAI_API_KEY environment variable

Media storage (choose one):

# Recommended: unified path (auto-creates videos/, images/, audio/ subdirs)
SANZARU_MEDIA_PATH="/path/to/media"

# Or individual paths (legacy, still supported)
VIDEO_PATH="/path/to/videos"
IMAGE_PATH="/path/to/images"
AUDIO_PATH="/path/to/audio"

Features are auto-detected based on configured paths. Set only what you need.

Quick Start

  1. Clone the repository:

    git clone https://github.com/TJC-LP/sanzaru.git
    cd sanzaru
    
  2. Run the setup script:

    ./setup.sh
    

    The script will:

    • Prompt for your OpenAI API key
    • Create directories and .env configuration
    • Install dependencies with uv sync --all-extras --dev
  3. Start using:

    claude
    

That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.

Installation

Claude Code Plugin (Recommended)

Install as a plugin — auto-configures the MCP server + includes prompting guidance:

/plugin marketplace add TJC-LP/sanzaru

Requires OPENAI_API_KEY and SANZARU_MEDIA_PATH environment variables to be set.

Quick Install

# All features
uv add "sanzaru[all]"

# Specific features
uv add "sanzaru[audio]"  # With audio support
uv add sanzaru           # Base (video + image only)
Alternative Installation Methods

From Source

git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uvx",
      "args": ["sanzaru[all]"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here",
        "SANZARU_MEDIA_PATH": "/absolute/path/to/media"
      }
    }
  }
}

Or from source:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
    }
  }
}

Codex MCP

# Using uvx (from PyPI)
codex mcp add sanzaru \
  --env OPENAI_API_KEY="sk-..." \
  --env SANZARU_MEDIA_PATH="$HOME/sanzaru-media" \
  -- uvx "sanzaru[all]"

Manual Setup

uv venv
uv sync

# Set required environment variables
export OPENAI_API_KEY=sk-...
export SANZARU_MEDIA_PATH=~/sanzaru-media

# Run server (stdio for MCP clients)
uv run sanzaru

# Or HTTP mode (for remote access)
uv run sanzaru --transport http --port 8000

Available Tools

Category Tools Description
Video create_video, get_video_status, download_video, list_videos, list_local_videos, delete_video, remix_video Generate and manage Sora videos with optional reference images
Image generate_image, edit_image, create_image, get_image_status, download_image Generate with gpt-image-2 (default, sync) or GPT-5 (polling)
Reference list_reference_images, prepare_reference_image Manage and resize images for Sora compatibility
Audio transcribe_audio, chat_with_audio, create_audio, convert_audio, compress_audio, list_audio_files, get_latest_audio, transcribe_with_enhancement Transcription, analysis, TTS, and file management
Podcast generate_podcast Multi-voice podcast generation with parallel TTS and audio stitching
Media view_media Interactive media player via MCP App protocol

Full API documentation: See docs/api-reference.md

Basic Workflows

Generate a Video

# Create video from text
video = create_video(
    prompt="A serene mountain landscape at sunrise",
    model="sora-2",
    seconds="8",
    size="1280x720"
)

# Poll for completion
status = get_video_status(video.id)

# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")

Generate with Reference Image

# 1. Generate reference image (gpt-image-2, synchronous)
generate_image(
    prompt="futuristic pilot in mech cockpit",
    size="1536x1024",
    filename="pilot.png"
)

# 2. Prepare for video (resize to Sora dimensions)
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")

# 3. Animate
video = create_video(
    prompt="The pilot looks up and smiles",
    size="1280x720",
    input_reference_filename="pilot_1280x720.png"
)

Audio Transcription

# List available audio files
files = list_audio_files(format="mp3")

# Transcribe
result = transcribe_audio("interview.mp3")

# Or analyze with GPT-4o
analysis = chat_with_audio(
    "meeting.mp3",
    user_prompt="Summarize key decisions and action items"
)

Generate a Podcast

generate_podcast(script={
    "title": "AI Weekly",
    "speakers": [
        {"id": "host", "name": "Alex", "voice": "nova"},
        {"id": "guest", "name": "Sam", "voice": "echo"}
    ],
    "segments": [
        {"speaker": "host", "text": "Welcome to AI Weekly!"},
        {"speaker": "guest", "text": "Thanks for having me."}
    ]
})

Documentation

Transport Modes

Mode Command Use Case
stdio (default) uv run sanzaru Claude Desktop, Claude Code, local MCP clients
HTTP uv run sanzaru --transport http Remote access, Databricks Apps, web clients

Storage Backends

Backend Config Use Case
Local (default) SANZARU_MEDIA_PATH=/path/to/media Development, local deployments
Databricks STORAGE_BACKEND=databricks Databricks Apps with Unity Catalog Volumes

The Databricks backend supports per-user storage isolation via the user_context module, enabling multi-tenant deployments where each user's media is stored under their own volume prefix.

See CLAUDE.md for full configuration details.

Performance

Fully asynchronous architecture with proven scalability:

  • ✅ 32+ concurrent operations verified
  • ✅ 8-10x speedup for parallel tasks
  • ✅ Non-blocking I/O with aiofiles + anyio
  • ✅ Python 3.14 free-threading ready

See docs/async-optimizations.md for technical details.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanzaru-0.6.2.tar.gz (214.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanzaru-0.6.2-py3-none-any.whl (228.2 kB view details)

Uploaded Python 3

File details

Details for the file sanzaru-0.6.2.tar.gz.

File metadata

  • Download URL: sanzaru-0.6.2.tar.gz
  • Upload date:
  • Size: 214.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sanzaru-0.6.2.tar.gz
Algorithm Hash digest
SHA256 944512da3e5e4b06ea2f4ad9eb772ec5fac8aa0d8e314774cf6911322969ef4a
MD5 e67236973b3c1206028ddc5a79ec7946
BLAKE2b-256 68c5fc3c97d94c7d40c7153bdc8c4227a7f3dfba2fe859f8c680405c358d9a49

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanzaru-0.6.2.tar.gz:

Publisher: publish-to-pypi.yml on TJC-LP/sanzaru

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanzaru-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: sanzaru-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 228.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sanzaru-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 544d4928095b188813c81b56a15acdd142ffcdd901f3f82539bda634f56e966c
MD5 5b38e6d88df4baefbe5dbb9a76a04470
BLAKE2b-256 ed5a0cdb5a085a3dad928aa16545d909c0ecbd0899710ef33c23fcb7cad00047

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanzaru-0.6.2-py3-none-any.whl:

Publisher: publish-to-pypi.yml on TJC-LP/sanzaru

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page