MiniMax-M2.5 AI terminal agent — chat, code, and create

These details have not been verified by PyPI

Project links

Project description

MiniMax-M2.5

Self-hosted MiniMax-M2.5 inference platform running on 8x NVIDIA H100 80GB GPUs.

Website: minimax.villamarket.ai Chat: app.minimax.villamarket.ai

Component	Description
vLLM (port 8080)	Model inference server (TP8 + expert parallel)
LiteLLM (port 4000)	API proxy with key management and cost tracking
Website	Landing page, API docs, dashboard, auth (Next.js + S3 + CloudFront)
DeerFlow	AI agent chat UI at `app.minimax.villamarket.ai` (Next.js + LangGraph)
CLI	Ollama-style CLI for managing the server
TUI	Terminal UI for API key management
iOS App	Native Swift app (in development)

Project Structure

.
├── scripts/                  # Server management scripts
│   ├── start.sh              # Start vLLM server
│   ├── start-all.sh          # Start vLLM + LiteLLM
│   ├── stop.sh               # Stop vLLM
│   ├── stop-all.sh           # Stop everything
│   ├── health.sh             # Health check
│   ├── test.sh               # Inference test
│   ├── test-tools.sh         # Tool calling test
│   └── download-model.sh     # Download model from HuggingFace
├── src/minimax_cli/          # CLI source code
│   ├── main.py               # Entry point
│   ├── api.py                # API client
│   ├── config.py             # Configuration
│   ├── constants.py          # Constants
│   └── commands/             # CLI subcommands
├── tui/                      # Admin TUI (Textual)
│   └── app.py                # Key management interface
├── website/                  # minimax.villamarket.ai
│   ├── src/                  # Next.js source
│   │   ├── app/              # App Router pages
│   │   ├── components/       # React components
│   │   └── lib/              # Utilities + auth
│   ├── lambda/               # AWS Lambda functions
│   │   ├── keys.py           # API key generation
│   │   ├── checkout.py       # Stripe checkout
│   │   ├── stripe_webhook.py # Stripe webhooks
│   │   ├── promo.py          # Promo codes
│   │   └── referral.py       # Referral system
│   ├── cf-function.js        # CloudFront Function
│   └── deploy.sh             # Build + deploy to S3/CloudFront
├── ios/                      # iOS app (Swift)
│   ├── MiniMaxApp/           # App source
│   │   ├── App/              # Entry point + state
│   │   ├── Core/API/         # SSE streaming + LangGraph client
│   │   ├── Core/Models/      # Data models
│   │   └── Features/         # Chat, Threads, Settings views
│   └── Package.swift         # Swift Package manifest
├── litellm-config.example.yaml
├── admin                     # Symlink to TUI launcher
├── pyproject.toml            # Python package config
├── CLAUDE.md                 # AI agent instructions
└── README.md                 # This file

CLI

Ollama-style CLI for managing the server and chatting with the model.

Install

pip install -e .

Commands

minimax run                 Interactive chat REPL with streaming + think blocks
minimax serve               Start full stack (vLLM + LiteLLM)
minimax serve --vllm-only   Start vLLM only
minimax stop                Stop all servers
minimax ps                  Show running processes, GPU usage, uptime
minimax list                List available models
minimax logs                Tail vLLM logs (--litellm for LiteLLM)
minimax test                Run inference health checks
minimax tui                 Launch admin TUI (key management)
minimax auth login          Store API key
minimax auth status         Check auth status
minimax auth logout         Remove stored key
minimax setup claude        Configure Claude Code
minimax setup codex         Configure Codex CLI
minimax setup aider         Configure Aider
minimax setup continue      Configure Continue (VS Code/JetBrains)
minimax setup cline         Print Cline setup instructions

Quick Start

# Start the server
minimax serve

# Check status
minimax ps

# Start chatting
minimax run

# Configure Claude Code to use this server
minimax auth login
minimax setup claude

Benchmarks

Benchmark	Score
SWE-Bench Verified	80.2%
Multi-SWE-Bench	51.3%

API Endpoint

https://gpu-workspace.taile8dc37.ts.net/minimax/v1

All requests require an API key:

Authorization: Bearer YOUR_API_KEY

Models

Model ID	Context	Description
`minimax-m2.5`	128K	Recommended
`MiniMaxAI/MiniMax-M2.5`	128K	Full name alias

Pricing

	Price
Input	$0.30 / 1M tokens
Output	$1.20 / 1M tokens

Quick Start

curl https://gpu-workspace.taile8dc37.ts.net/minimax/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Integrations

Claude Code

{
  "apiProvider": "custom",
  "customApiBaseUrl": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
  "customApiKey": "YOUR_API_KEY",
  "customModelId": "minimax-m2.5"
}

Codex (OpenAI CLI)

export OPENAI_BASE_URL="https://gpu-workspace.taile8dc37.ts.net/minimax/v1"
export OPENAI_API_KEY="YOUR_API_KEY"
codex --model minimax-m2.5 "Write a Python function"

Aider

aider --openai-api-base https://gpu-workspace.taile8dc37.ts.net/minimax/v1 \
      --openai-api-key YOUR_API_KEY \
      --model openai/minimax-m2.5

Continue (VS Code / JetBrains)

Add to ~/.continue/config.json:

{
  "models": [{
    "title": "MiniMax-M2.5",
    "provider": "openai",
    "model": "minimax-m2.5",
    "apiBase": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    "apiKey": "YOUR_API_KEY"
  }]
}

Cline (VS Code)

API Provider: "OpenAI Compatible"
Base URL: https://gpu-workspace.taile8dc37.ts.net/minimax/v1
API Key: YOUR_API_KEY
Model ID: minimax-m2.5

Any OpenAI-compatible client

Setting	Value
Base URL	`https://gpu-workspace.taile8dc37.ts.net/minimax/v1`
API Key	Your API key
Model	`minimax-m2.5`

Code Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Python (streaming)

stream = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Write a Redis cache decorator."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "minimax-m2.5",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

API Reference

POST /v1/chat/completions

Standard OpenAI chat completions endpoint. Supports streaming, function calling, temperature, top_p, max_tokens, stop sequences.

GET /v1/models

List available models.

GET /health/liveliness

Health check — returns 200 when ready.

Self-Hosting

Requirements

8x NVIDIA H100 80GB (or equivalent ~640 GB VRAM)
vLLM v0.15+
CUDA 12.8+

Download Model

pip install huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MiniMaxAI/MiniMax-M2.5 \
    --local-dir /path/to/MiniMax-M2.5-HF

Start Server

vllm serve /path/to/MiniMax-M2.5-HF \
    --tensor-parallel-size 8 \
    --enable-expert-parallel \
    --trust-remote-code \
    --gpu-memory-utilization 0.95 \
    --max-num-seqs 16 \
    --max-model-len 131072 \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --served-model-name minimax-m2.5 \
    --compilation-config '{"cudagraph_mode": "PIECEWISE"}'

API Key Management

minimax tui   # or ./admin

Infrastructure

Service	URL	Hosting
Website	minimax.villamarket.ai	S3 + CloudFront
Chat UI	app.minimax.villamarket.ai	CloudFront -> Tailscale Funnel -> DeerFlow
API	gpu-workspace.taile8dc37.ts.net/minimax/v1	Tailscale Funnel -> LiteLLM

Rate Limits

Max concurrent requests: 16
Max context length: 131,072 tokens (128K)
Request timeout: 600 seconds

Support

Contact: support@villamarket.ai

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimax_agent-0.2.0.tar.gz (22.4 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minimax_agent-0.2.0-py3-none-any.whl (30.1 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file minimax_agent-0.2.0.tar.gz.

File metadata

Download URL: minimax_agent-0.2.0.tar.gz
Upload date: Mar 16, 2026
Size: 22.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minimax_agent-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1795dba8aecac0463a04cd7ae5c784425850b7fd44967229d80bfefc5d0b1255`
MD5	`333bba6d53ae76266e07bd58c6abd9b4`
BLAKE2b-256	`019ec70280cc9c330f59841bb0db941bcfb5285996914d27a9645a4bcaab4058`

See more details on using hashes here.

File details

Details for the file minimax_agent-0.2.0-py3-none-any.whl.

File metadata

Download URL: minimax_agent-0.2.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 30.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minimax_agent-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`042c8575402342eebc1b58d991b4cd6244319efffb279f2494f814a9ce38e2aa`
MD5	`2c005ab7b382f9a01a5a9fb28f66c4af`
BLAKE2b-256	`f59fa9b5c6b3aa3621e6e97cb780e0d913f41d4d819fe0b112c6668f6e40873c`

See more details on using hashes here.

minimax-agent 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MiniMax-M2.5

Project Structure

CLI

Install

Commands

Quick Start

Benchmarks

API Endpoint

Models

Pricing

Quick Start

Integrations

Claude Code

Codex (OpenAI CLI)

Aider

Continue (VS Code / JetBrains)

Cline (VS Code)

Any OpenAI-compatible client

Code Examples

Python

Python (streaming)

Node.js / TypeScript

API Reference

POST /v1/chat/completions

GET /v1/models

GET /health/liveliness

Self-Hosting

Requirements

Download Model

Start Server

API Key Management

Infrastructure

Rate Limits

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes