MiniMax-M2.5 AI terminal agent — chat, code, and create
Project description
MiniMax-M2.5
Self-hosted MiniMax-M2.5 inference platform running on 8x NVIDIA H100 80GB GPUs.
Website: minimax.villamarket.ai Chat: app.minimax.villamarket.ai
| Component | Description |
|---|---|
| vLLM (port 8080) | Model inference server (TP8 + expert parallel) |
| LiteLLM (port 4000) | API proxy with key management and cost tracking |
| Website | Landing page, API docs, dashboard, auth (Next.js + S3 + CloudFront) |
| DeerFlow | AI agent chat UI at app.minimax.villamarket.ai (Next.js + LangGraph) |
| CLI | Ollama-style CLI for managing the server |
| TUI | Terminal UI for API key management |
| iOS App | Native Swift app (in development) |
Project Structure
.
├── scripts/ # Server management scripts
│ ├── start.sh # Start vLLM server
│ ├── start-all.sh # Start vLLM + LiteLLM
│ ├── stop.sh # Stop vLLM
│ ├── stop-all.sh # Stop everything
│ ├── health.sh # Health check
│ ├── test.sh # Inference test
│ ├── test-tools.sh # Tool calling test
│ └── download-model.sh # Download model from HuggingFace
├── src/minimax_cli/ # CLI source code
│ ├── main.py # Entry point
│ ├── api.py # API client
│ ├── config.py # Configuration
│ ├── constants.py # Constants
│ └── commands/ # CLI subcommands
├── tui/ # Admin TUI (Textual)
│ └── app.py # Key management interface
├── website/ # minimax.villamarket.ai
│ ├── src/ # Next.js source
│ │ ├── app/ # App Router pages
│ │ ├── components/ # React components
│ │ └── lib/ # Utilities + auth
│ ├── lambda/ # AWS Lambda functions
│ │ ├── keys.py # API key generation
│ │ ├── checkout.py # Stripe checkout
│ │ ├── stripe_webhook.py # Stripe webhooks
│ │ ├── promo.py # Promo codes
│ │ └── referral.py # Referral system
│ ├── cf-function.js # CloudFront Function
│ └── deploy.sh # Build + deploy to S3/CloudFront
├── ios/ # iOS app (Swift)
│ ├── MiniMaxApp/ # App source
│ │ ├── App/ # Entry point + state
│ │ ├── Core/API/ # SSE streaming + LangGraph client
│ │ ├── Core/Models/ # Data models
│ │ └── Features/ # Chat, Threads, Settings views
│ └── Package.swift # Swift Package manifest
├── litellm-config.example.yaml
├── admin # Symlink to TUI launcher
├── pyproject.toml # Python package config
├── CLAUDE.md # AI agent instructions
└── README.md # This file
CLI
Ollama-style CLI for managing the server and chatting with the model.
Install
pip install -e .
Commands
minimax run Interactive chat REPL with streaming + think blocks
minimax serve Start full stack (vLLM + LiteLLM)
minimax serve --vllm-only Start vLLM only
minimax stop Stop all servers
minimax ps Show running processes, GPU usage, uptime
minimax list List available models
minimax logs Tail vLLM logs (--litellm for LiteLLM)
minimax test Run inference health checks
minimax tui Launch admin TUI (key management)
minimax auth login Store API key
minimax auth status Check auth status
minimax auth logout Remove stored key
minimax setup claude Configure Claude Code
minimax setup codex Configure Codex CLI
minimax setup aider Configure Aider
minimax setup continue Configure Continue (VS Code/JetBrains)
minimax setup cline Print Cline setup instructions
Quick Start
# Start the server
minimax serve
# Check status
minimax ps
# Start chatting
minimax run
# Configure Claude Code to use this server
minimax auth login
minimax setup claude
Benchmarks
| Benchmark | Score |
|---|---|
| SWE-Bench Verified | 80.2% |
| Multi-SWE-Bench | 51.3% |
API Endpoint
https://gpu-workspace.taile8dc37.ts.net/minimax/v1
All requests require an API key:
Authorization: Bearer YOUR_API_KEY
Models
| Model ID | Context | Description |
|---|---|---|
minimax-m2.5 |
128K | Recommended |
MiniMaxAI/MiniMax-M2.5 |
128K | Full name alias |
Pricing
| Price | |
|---|---|
| Input | $0.30 / 1M tokens |
| Output | $1.20 / 1M tokens |
Quick Start
curl https://gpu-workspace.taile8dc37.ts.net/minimax/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-m2.5",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Integrations
Claude Code
{
"apiProvider": "custom",
"customApiBaseUrl": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
"customApiKey": "YOUR_API_KEY",
"customModelId": "minimax-m2.5"
}
Codex (OpenAI CLI)
export OPENAI_BASE_URL="https://gpu-workspace.taile8dc37.ts.net/minimax/v1"
export OPENAI_API_KEY="YOUR_API_KEY"
codex --model minimax-m2.5 "Write a Python function"
Aider
aider --openai-api-base https://gpu-workspace.taile8dc37.ts.net/minimax/v1 \
--openai-api-key YOUR_API_KEY \
--model openai/minimax-m2.5
Continue (VS Code / JetBrains)
Add to ~/.continue/config.json:
{
"models": [{
"title": "MiniMax-M2.5",
"provider": "openai",
"model": "minimax-m2.5",
"apiBase": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
"apiKey": "YOUR_API_KEY"
}]
}
Cline (VS Code)
- API Provider: "OpenAI Compatible"
- Base URL:
https://gpu-workspace.taile8dc37.ts.net/minimax/v1 - API Key:
YOUR_API_KEY - Model ID:
minimax-m2.5
Any OpenAI-compatible client
| Setting | Value |
|---|---|
| Base URL | https://gpu-workspace.taile8dc37.ts.net/minimax/v1 |
| API Key | Your API key |
| Model | minimax-m2.5 |
Code Examples
Python
from openai import OpenAI
client = OpenAI(
base_url="https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="minimax-m2.5",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Python (streaming)
stream = client.chat.completions.create(
model="minimax-m2.5",
messages=[{"role": "user", "content": "Write a Redis cache decorator."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
apiKey: "YOUR_API_KEY",
});
const response = await client.chat.completions.create({
model: "minimax-m2.5",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
API Reference
POST /v1/chat/completions
Standard OpenAI chat completions endpoint. Supports streaming, function calling, temperature, top_p, max_tokens, stop sequences.
GET /v1/models
List available models.
GET /health/liveliness
Health check — returns 200 when ready.
Self-Hosting
Requirements
- 8x NVIDIA H100 80GB (or equivalent ~640 GB VRAM)
- vLLM v0.15+
- CUDA 12.8+
Download Model
pip install huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MiniMaxAI/MiniMax-M2.5 \
--local-dir /path/to/MiniMax-M2.5-HF
Start Server
vllm serve /path/to/MiniMax-M2.5-HF \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--trust-remote-code \
--gpu-memory-utilization 0.95 \
--max-num-seqs 16 \
--max-model-len 131072 \
--enable-prefix-caching \
--enable-chunked-prefill \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--served-model-name minimax-m2.5 \
--compilation-config '{"cudagraph_mode": "PIECEWISE"}'
API Key Management
minimax tui # or ./admin
Keys: g generate | v view | e email key | b set budget | d delete | r refresh | q quit
Infrastructure
| Service | URL | Hosting |
|---|---|---|
| Website | minimax.villamarket.ai | S3 + CloudFront |
| Chat UI | app.minimax.villamarket.ai | CloudFront -> Tailscale Funnel -> DeerFlow |
| API | gpu-workspace.taile8dc37.ts.net/minimax/v1 | Tailscale Funnel -> LiteLLM |
Rate Limits
- Max concurrent requests: 16
- Max context length: 131,072 tokens (128K)
- Request timeout: 600 seconds
Support
Contact: support@villamarket.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minimax_agent-0.2.0.tar.gz.
File metadata
- Download URL: minimax_agent-0.2.0.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1795dba8aecac0463a04cd7ae5c784425850b7fd44967229d80bfefc5d0b1255
|
|
| MD5 |
333bba6d53ae76266e07bd58c6abd9b4
|
|
| BLAKE2b-256 |
019ec70280cc9c330f59841bb0db941bcfb5285996914d27a9645a4bcaab4058
|
File details
Details for the file minimax_agent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: minimax_agent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
042c8575402342eebc1b58d991b4cd6244319efffb279f2494f814a9ce38e2aa
|
|
| MD5 |
2c005ab7b382f9a01a5a9fb28f66c4af
|
|
| BLAKE2b-256 |
f59fa9b5c6b3aa3621e6e97cb780e0d913f41d4d819fe0b112c6668f6e40873c
|