Skip to main content

MiniMax-M2.5 AI terminal agent — chat, code, and create

Project description

MiniMax-M2.5

Self-hosted MiniMax-M2.5 inference platform running on 8x NVIDIA H100 80GB GPUs.

Website: minimax.villamarket.ai Chat: app.minimax.villamarket.ai

Component Description
vLLM (port 8080) Model inference server (TP8 + expert parallel)
LiteLLM (port 4000) API proxy with key management and cost tracking
Website Landing page, API docs, dashboard, auth (Next.js + S3 + CloudFront)
DeerFlow AI agent chat UI at app.minimax.villamarket.ai (Next.js + LangGraph)
CLI Ollama-style CLI for managing the server
TUI Terminal UI for API key management
iOS App Native Swift app (in development)

Project Structure

.
├── scripts/                  # Server management scripts
│   ├── start.sh              # Start vLLM server
│   ├── start-all.sh          # Start vLLM + LiteLLM
│   ├── stop.sh               # Stop vLLM
│   ├── stop-all.sh           # Stop everything
│   ├── health.sh             # Health check
│   ├── test.sh               # Inference test
│   ├── test-tools.sh         # Tool calling test
│   └── download-model.sh     # Download model from HuggingFace
├── src/minimax_cli/          # CLI source code
│   ├── main.py               # Entry point
│   ├── api.py                # API client
│   ├── config.py             # Configuration
│   ├── constants.py          # Constants
│   └── commands/             # CLI subcommands
├── tui/                      # Admin TUI (Textual)
│   └── app.py                # Key management interface
├── website/                  # minimax.villamarket.ai
│   ├── src/                  # Next.js source
│   │   ├── app/              # App Router pages
│   │   ├── components/       # React components
│   │   └── lib/              # Utilities + auth
│   ├── lambda/               # AWS Lambda functions
│   │   ├── keys.py           # API key generation
│   │   ├── checkout.py       # Stripe checkout
│   │   ├── stripe_webhook.py # Stripe webhooks
│   │   ├── promo.py          # Promo codes
│   │   └── referral.py       # Referral system
│   ├── cf-function.js        # CloudFront Function
│   └── deploy.sh             # Build + deploy to S3/CloudFront
├── ios/                      # iOS app (Swift)
│   ├── MiniMaxApp/           # App source
│   │   ├── App/              # Entry point + state
│   │   ├── Core/API/         # SSE streaming + LangGraph client
│   │   ├── Core/Models/      # Data models
│   │   └── Features/         # Chat, Threads, Settings views
│   └── Package.swift         # Swift Package manifest
├── litellm-config.example.yaml
├── admin                     # Symlink to TUI launcher
├── pyproject.toml            # Python package config
├── CLAUDE.md                 # AI agent instructions
└── README.md                 # This file

CLI

Ollama-style CLI for managing the server and chatting with the model.

Install

pip install -e .

Commands

minimax run                 Interactive chat REPL with streaming + think blocks
minimax serve               Start full stack (vLLM + LiteLLM)
minimax serve --vllm-only   Start vLLM only
minimax stop                Stop all servers
minimax ps                  Show running processes, GPU usage, uptime
minimax list                List available models
minimax logs                Tail vLLM logs (--litellm for LiteLLM)
minimax test                Run inference health checks
minimax tui                 Launch admin TUI (key management)
minimax auth login          Store API key
minimax auth status         Check auth status
minimax auth logout         Remove stored key
minimax setup claude        Configure Claude Code
minimax setup codex         Configure Codex CLI
minimax setup aider         Configure Aider
minimax setup continue      Configure Continue (VS Code/JetBrains)
minimax setup cline         Print Cline setup instructions

Quick Start

# Start the server
minimax serve

# Check status
minimax ps

# Start chatting
minimax run

# Configure Claude Code to use this server
minimax auth login
minimax setup claude

Benchmarks

Benchmark Score
SWE-Bench Verified 80.2%
Multi-SWE-Bench 51.3%

API Endpoint

https://gpu-workspace.taile8dc37.ts.net/minimax/v1

All requests require an API key:

Authorization: Bearer YOUR_API_KEY

Models

Model ID Context Description
minimax-m2.5 128K Recommended
MiniMaxAI/MiniMax-M2.5 128K Full name alias

Pricing

Price
Input $0.30 / 1M tokens
Output $1.20 / 1M tokens

Quick Start

curl https://gpu-workspace.taile8dc37.ts.net/minimax/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Integrations

Claude Code

{
  "apiProvider": "custom",
  "customApiBaseUrl": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
  "customApiKey": "YOUR_API_KEY",
  "customModelId": "minimax-m2.5"
}

Codex (OpenAI CLI)

export OPENAI_BASE_URL="https://gpu-workspace.taile8dc37.ts.net/minimax/v1"
export OPENAI_API_KEY="YOUR_API_KEY"
codex --model minimax-m2.5 "Write a Python function"

Aider

aider --openai-api-base https://gpu-workspace.taile8dc37.ts.net/minimax/v1 \
      --openai-api-key YOUR_API_KEY \
      --model openai/minimax-m2.5

Continue (VS Code / JetBrains)

Add to ~/.continue/config.json:

{
  "models": [{
    "title": "MiniMax-M2.5",
    "provider": "openai",
    "model": "minimax-m2.5",
    "apiBase": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    "apiKey": "YOUR_API_KEY"
  }]
}

Cline (VS Code)

  1. API Provider: "OpenAI Compatible"
  2. Base URL: https://gpu-workspace.taile8dc37.ts.net/minimax/v1
  3. API Key: YOUR_API_KEY
  4. Model ID: minimax-m2.5

Any OpenAI-compatible client

Setting Value
Base URL https://gpu-workspace.taile8dc37.ts.net/minimax/v1
API Key Your API key
Model minimax-m2.5

Code Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Python (streaming)

stream = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Write a Redis cache decorator."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "minimax-m2.5",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

API Reference

POST /v1/chat/completions

Standard OpenAI chat completions endpoint. Supports streaming, function calling, temperature, top_p, max_tokens, stop sequences.

GET /v1/models

List available models.

GET /health/liveliness

Health check — returns 200 when ready.


Self-Hosting

Requirements

  • 8x NVIDIA H100 80GB (or equivalent ~640 GB VRAM)
  • vLLM v0.15+
  • CUDA 12.8+

Download Model

pip install huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MiniMaxAI/MiniMax-M2.5 \
    --local-dir /path/to/MiniMax-M2.5-HF

Start Server

vllm serve /path/to/MiniMax-M2.5-HF \
    --tensor-parallel-size 8 \
    --enable-expert-parallel \
    --trust-remote-code \
    --gpu-memory-utilization 0.95 \
    --max-num-seqs 16 \
    --max-model-len 131072 \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --served-model-name minimax-m2.5 \
    --compilation-config '{"cudagraph_mode": "PIECEWISE"}'

API Key Management

minimax tui   # or ./admin

Keys: g generate | v view | e email key | b set budget | d delete | r refresh | q quit


Infrastructure

Service URL Hosting
Website minimax.villamarket.ai S3 + CloudFront
Chat UI app.minimax.villamarket.ai CloudFront -> Tailscale Funnel -> DeerFlow
API gpu-workspace.taile8dc37.ts.net/minimax/v1 Tailscale Funnel -> LiteLLM

Rate Limits

  • Max concurrent requests: 16
  • Max context length: 131,072 tokens (128K)
  • Request timeout: 600 seconds

Support

Contact: support@villamarket.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimax_agent-0.2.0.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimax_agent-0.2.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file minimax_agent-0.2.0.tar.gz.

File metadata

  • Download URL: minimax_agent-0.2.0.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minimax_agent-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1795dba8aecac0463a04cd7ae5c784425850b7fd44967229d80bfefc5d0b1255
MD5 333bba6d53ae76266e07bd58c6abd9b4
BLAKE2b-256 019ec70280cc9c330f59841bb0db941bcfb5285996914d27a9645a4bcaab4058

See more details on using hashes here.

File details

Details for the file minimax_agent-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: minimax_agent-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minimax_agent-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 042c8575402342eebc1b58d991b4cd6244319efffb279f2494f814a9ce38e2aa
MD5 2c005ab7b382f9a01a5a9fb28f66c4af
BLAKE2b-256 f59fa9b5c6b3aa3621e6e97cb780e0d913f41d4d819fe0b112c6668f6e40873c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page