Unified MCP server for OpenAI multimodal APIs (Sora, Whisper, GPT Vision)
Project description
sanzaru
A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, and GPT-4o Audio APIs via the OpenAI Python SDK.
Features
Video Generation (Sora)
- Create videos with
sora-2orsora-2-promodels - Use reference images to guide generation
- Remix and refine existing videos
- Download variants (video, thumbnail, spritesheet)
Image Generation
- Generate images with gpt-image-1.5 (recommended) or GPT-5
- Edit and compose images with up to 16 inputs
- Iterative refinement via Responses API
- Automatic resizing for Sora compatibility
Audio Processing
- Transcription: Whisper and GPT-4o models
- Audio Chat: Interactive analysis with GPT-4o
- Text-to-Speech: Multi-voice TTS generation
- Processing: Format conversion, compression, file management
Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.
Requirements
- Python 3.10+
OPENAI_API_KEYenvironment variable
Feature-specific paths (set only what you need):
VIDEO_PATH- Enables video generation featuresIMAGE_PATH- Enables image generation featuresAUDIO_PATH- Enables audio processing features
Quick Start
-
Clone the repository:
git clone https://github.com/TJC-LP/sanzaru.git cd sanzaru
-
Run the setup script:
./setup.sh
The script will:
- Prompt for your OpenAI API key
- Create directories and
.envconfiguration - Install dependencies with
uv sync --all-extras --dev
-
Start using:
claude
That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.
Installation
Quick Install
# All features
uv add "sanzaru[all]"
# Specific features
uv add "sanzaru[audio]" # With audio support
uv add sanzaru # Base (video + image only)
Alternative Installation Methods
From Source
git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"sanzaru": {
"command": "uvx",
"args": ["sanzaru[all]"],
"env": {
"OPENAI_API_KEY": "your-api-key-here",
"VIDEO_PATH": "/absolute/path/to/videos",
"IMAGE_PATH": "/absolute/path/to/images",
"AUDIO_PATH": "/absolute/path/to/audio"
}
}
}
}
Or from source:
{
"mcpServers": {
"sanzaru": {
"command": "uv",
"args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
}
}
}
Codex MCP
# Using uvx (from PyPI)
codex mcp add sanzaru \
--env OPENAI_API_KEY="sk-..." \
--env VIDEO_PATH="$HOME/sanzaru-videos" \
--env IMAGE_PATH="$HOME/sanzaru-images" \
--env AUDIO_PATH="$HOME/sanzaru-audio" \
-- uvx "sanzaru[all]"
# Or from source
cd /path/to/sanzaru
set -a; source .env; set +a
codex mcp add sanzaru \
--env OPENAI_API_KEY="$OPENAI_API_KEY" \
--env VIDEO_PATH="$VIDEO_PATH" \
--env IMAGE_PATH="$IMAGE_PATH" \
--env AUDIO_PATH="$AUDIO_PATH" \
-- uv run --directory "$(pwd)" sanzaru
Manual Setup
uv venv
uv sync
# Set required environment variables
export OPENAI_API_KEY=sk-...
export VIDEO_PATH=~/videos
export IMAGE_PATH=~/images
export AUDIO_PATH=~/audio
# Run server
uv run sanzaru
Feature Auto-Detection: Features are automatically enabled based on configured paths. Set only the paths you need.
Available Tools
| Category | Tools | Description |
|---|---|---|
| Video | create_video, get_video_status, download_video, list_videos, delete_video, remix_video |
Generate and manage Sora videos with optional reference images |
| Image | generate_image, edit_image, create_image, get_image_status, download_image |
Generate with gpt-image-1.5 (sync) or GPT-5 (polling) |
| Reference | list_reference_images, prepare_reference_image |
Manage and resize images for Sora compatibility |
| Audio | transcribe_audio, chat_with_audio, create_audio, convert_audio, compress_audio, list_audio_files, get_latest_audio, transcribe_with_enhancement |
Transcription, analysis, TTS, and file management |
Full API documentation: See docs/api-reference.md
Basic Workflows
Generate a Video
# Create video from text
video = create_video(
prompt="A serene mountain landscape at sunrise",
model="sora-2",
seconds="8",
size="1280x720"
)
# Poll for completion
status = get_video_status(video.id)
# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")
Generate with Reference Image
# 1. Generate reference image (gpt-image-1.5, synchronous)
generate_image(
prompt="futuristic pilot in mech cockpit",
size="1536x1024",
filename="pilot.png"
)
# 2. Prepare for video (resize to Sora dimensions)
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")
# 3. Animate
video = create_video(
prompt="The pilot looks up and smiles",
size="1280x720",
input_reference_filename="pilot_1280x720.png"
)
Audio Transcription
# List available audio files
files = list_audio_files(format="mp3")
# Transcribe
result = transcribe_audio("interview.mp3")
# Or analyze with GPT-4o
analysis = chat_with_audio(
"meeting.mp3",
user_prompt="Summarize key decisions and action items"
)
Documentation
- API Reference - Complete tool documentation with parameters and examples
- Reference Images Guide - Working with reference images and resizing
- Image Generation Guide - Generating and editing reference images
- Sora Prompting Guide - Crafting effective video prompts
- Audio Features - Audio transcription, chat, and TTS
- Performance & Architecture - Technical details and benchmarks
Performance
Fully asynchronous architecture with proven scalability:
- ✅ 32+ concurrent operations verified
- ✅ 8-10x speedup for parallel tasks
- ✅ Non-blocking I/O with
aiofiles+anyio - ✅ Python 3.14 free-threading ready
See docs/async-optimizations.md for technical details.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sanzaru-0.3.2.tar.gz.
File metadata
- Download URL: sanzaru-0.3.2.tar.gz
- Upload date:
- Size: 53.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
111dc5c33b29239bea26cb99af1fbe894e8a9e462c15ecd3a2626d2f5f1c24b0
|
|
| MD5 |
cef33e0363a7529401e6bca4db25504d
|
|
| BLAKE2b-256 |
c8dd8c469e75c6a149329e11b2524f749aa677058cfb0eb95bedac5542fe7f1d
|
Provenance
The following attestation bundles were made for sanzaru-0.3.2.tar.gz:
Publisher:
publish-to-pypi.yml on TJC-LP/sanzaru
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanzaru-0.3.2.tar.gz -
Subject digest:
111dc5c33b29239bea26cb99af1fbe894e8a9e462c15ecd3a2626d2f5f1c24b0 - Sigstore transparency entry: 873463769
- Sigstore integration time:
-
Permalink:
TJC-LP/sanzaru@2934486229149097f08ff265ed41ee3e40a586d2 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/TJC-LP
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@2934486229149097f08ff265ed41ee3e40a586d2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sanzaru-0.3.2-py3-none-any.whl.
File metadata
- Download URL: sanzaru-0.3.2-py3-none-any.whl
- Upload date:
- Size: 64.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86f447c7baf021c2a63dd566252ce64f7f57e3661219a5fb3c80dd85bf2aafca
|
|
| MD5 |
94b08e5255b61f3207cb60ed603fb8ba
|
|
| BLAKE2b-256 |
53cc858657e15b34275fe782efe65e2251adea999d72030349f2a7fe75cf1c09
|
Provenance
The following attestation bundles were made for sanzaru-0.3.2-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on TJC-LP/sanzaru
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanzaru-0.3.2-py3-none-any.whl -
Subject digest:
86f447c7baf021c2a63dd566252ce64f7f57e3661219a5fb3c80dd85bf2aafca - Sigstore transparency entry: 873463792
- Sigstore integration time:
-
Permalink:
TJC-LP/sanzaru@2934486229149097f08ff265ed41ee3e40a586d2 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/TJC-LP
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@2934486229149097f08ff265ed41ee3e40a586d2 -
Trigger Event:
release
-
Statement type: