Skip to main content

Unified MCP server for OpenAI multimodal APIs (Sora, Whisper, GPT Vision)

Project description

sanzaru

sanzaru logo

A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, and GPT-4o Audio APIs via the OpenAI Python SDK.

Features

Video Generation (Sora)

  • Create videos with sora-2 or sora-2-pro models
  • Use reference images to guide generation
  • Remix and refine existing videos
  • Download variants (video, thumbnail, spritesheet)

Image Generation

  • Generate reference images with GPT-5/GPT-4.1
  • Iterative refinement and image editing
  • Automatic resizing for Sora compatibility

Audio Processing

  • Transcription: Whisper and GPT-4o models
  • Audio Chat: Interactive analysis with GPT-4o
  • Text-to-Speech: Multi-voice TTS generation
  • Processing: Format conversion, compression, file management

Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.

Requirements

  • Python 3.10+
  • OPENAI_API_KEY environment variable

Feature-specific paths (set only what you need):

  • VIDEO_PATH - Enables video generation features
  • IMAGE_PATH - Enables image generation features
  • AUDIO_PATH - Enables audio processing features

Quick Start

  1. Clone the repository:

    git clone https://github.com/TJC-LP/sanzaru.git
    cd sanzaru
    
  2. Run the setup script:

    ./setup.sh
    

    The script will:

    • Prompt for your OpenAI API key
    • Create directories and .env configuration
    • Install dependencies with uv sync --all-extras --dev
  3. Start using:

    claude
    

That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.

Installation

Quick Install

# All features
uv add "sanzaru[all]"

# Specific features
uv add "sanzaru[audio]"  # With audio support
uv add sanzaru           # Base (video + image only)
Alternative Installation Methods

From Source

git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uvx",
      "args": ["sanzaru[all]"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here",
        "VIDEO_PATH": "/absolute/path/to/videos",
        "IMAGE_PATH": "/absolute/path/to/images",
        "AUDIO_PATH": "/absolute/path/to/audio"
      }
    }
  }
}

Or from source:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
    }
  }
}

Codex MCP

# Using uvx (from PyPI)
codex mcp add sanzaru \
  --env OPENAI_API_KEY="sk-..." \
  --env VIDEO_PATH="$HOME/sanzaru-videos" \
  --env IMAGE_PATH="$HOME/sanzaru-images" \
  --env AUDIO_PATH="$HOME/sanzaru-audio" \
  -- uvx "sanzaru[all]"

# Or from source
cd /path/to/sanzaru
set -a; source .env; set +a
codex mcp add sanzaru \
  --env OPENAI_API_KEY="$OPENAI_API_KEY" \
  --env VIDEO_PATH="$VIDEO_PATH" \
  --env IMAGE_PATH="$IMAGE_PATH" \
  --env AUDIO_PATH="$AUDIO_PATH" \
  -- uv run --directory "$(pwd)" sanzaru

Manual Setup

uv venv
uv sync

# Set required environment variables
export OPENAI_API_KEY=sk-...
export VIDEO_PATH=~/videos
export IMAGE_PATH=~/images
export AUDIO_PATH=~/audio

# Run server
uv run sanzaru

Feature Auto-Detection: Features are automatically enabled based on configured paths. Set only the paths you need.

Available Tools

Category Tools Description
Video create_video, get_video_status, download_video, list_videos, delete_video, remix_video Generate and manage Sora videos with optional reference images
Image create_image, get_image_status, download_image Generate reference images with GPT-5/GPT-4.1
Reference list_reference_images, prepare_reference_image Manage and resize images for Sora compatibility
Audio transcribe_audio, chat_with_audio, create_audio, convert_audio, compress_audio, list_audio_files, get_latest_audio, transcribe_with_enhancement Transcription, analysis, TTS, and file management

Full API documentation: See docs/api-reference.md

Basic Workflows

Generate a Video

# Create video from text
video = create_video(
    prompt="A serene mountain landscape at sunrise",
    model="sora-2",
    seconds="8",
    size="1280x720"
)

# Poll for completion
status = get_video_status(video.id)

# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")

Generate with Reference Image

# 1. Generate reference image
resp = create_image(prompt="futuristic pilot in mech cockpit")
download_image(resp.id, filename="pilot.png")

# 2. Prepare for video
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")

# 3. Animate
video = create_video(
    prompt="The pilot looks up and smiles",
    size="1280x720",
    input_reference_filename="pilot_1280x720.png"
)

Audio Transcription

# List available audio files
files = list_audio_files(format="mp3")

# Transcribe
result = transcribe_audio("interview.mp3")

# Or analyze with GPT-4o
analysis = chat_with_audio(
    "meeting.mp3",
    user_prompt="Summarize key decisions and action items"
)

Documentation

Performance

Fully asynchronous architecture with proven scalability:

  • ✅ 32+ concurrent operations verified
  • ✅ 8-10x speedup for parallel tasks
  • ✅ Non-blocking I/O with aiofiles + anyio
  • ✅ Python 3.14 free-threading ready

See docs/async-optimizations.md for technical details.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanzaru-0.2.1.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanzaru-0.2.1-py3-none-any.whl (59.8 kB view details)

Uploaded Python 3

File details

Details for the file sanzaru-0.2.1.tar.gz.

File metadata

  • Download URL: sanzaru-0.2.1.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sanzaru-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2ea6af9593623dbb66e2aeea4b42da71d9446c1c97c99871fcd8051e54b6b124
MD5 d767f82002a760aa0723ad7b35b790a2
BLAKE2b-256 bfc2145d5ef6e4545f907f93f7cc76c508eb1875f91f7a7c3ffdeb351a865a8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanzaru-0.2.1.tar.gz:

Publisher: publish-to-pypi.yml on TJC-LP/sanzaru

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanzaru-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sanzaru-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 59.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sanzaru-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d8668dfffba666fa2b4f72fa4c96dc77034d1f409910eb96ca5f908c9c016f87
MD5 6eec916a72dc0e3a238816b7555e9789
BLAKE2b-256 81a4f3b3570e34bd658a71abfb68a7b92d1b2bbb35c6cc78c6d25ecc260c54b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanzaru-0.2.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on TJC-LP/sanzaru

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page