High-performance MiniMax M2.7 inference library optimized for GMKtech M7

These details have not been verified by PyPI

Project links

Project description

Miniforge

High-performance Python library for MiniMax M2.7 inference, optimized for GMKtech M7 hardware (AMD Ryzen 7 PRO 6850H, 28GB RAM).

Features

GGUF Quantization: Q4_K_M recommended for best quality/size tradeoff
TurboQuant KV Cache: 3-bit compression (turbo3) for 5x smaller memory footprint
Tool Calling: Native support for function calling
Vision/Multimodal: Image understanding capabilities
Streaming: Real-time token streaming for responsive UIs
Memory Management: Hard 28GB limits with automatic optimization
Async Support: Full asyncio support throughout
Runtime Presets: Speed, balanced, memory, quality, and MoE tuning profiles
Config Doctoring: Inspect detected hardware and resolved runtime config from the CLI
Optimization Reports: Explain why threads, context, KV cache, mmap, and offload were selected
Local Model Hosting: Register GGUF or HF-style directories and run fully offline

Quick Start

# Install with uv
uv pip install -e .

# Or install from source
git clone https://github.com/Zapdev-labs/miniforge.git
cd miniforge
uv pip install -e ".[all]"

The default install omits llama-cpp-python so editable installs work on Windows without the Visual Studio C++ toolchain. For the llama_cpp GGUF backend, run uv pip install -e ".[llama-cpp]" (on Windows you need Build Tools for Visual Studio with the C++ workload, or a matching prebuilt wheel). The [all] extra is server plus dev tools only and does not pull in llama-cpp-python.

import asyncio
from miniforge import Miniforge

async def main():
    # Load model (auto-downloads GGUF if available)
    model = await Miniforge.from_pretrained(
        "MiniMaxAI/MiniMax-M2.7",
        quantization="Q4_K_M",
    )
    
    # Simple chat
    response = await model.chat(
        "Explain quantum computing",
        system_prompt="You are a helpful assistant.",
    )
    print(response)
    
    # Streaming
    stream = await model.chat("Tell me a story", stream=True)
    async for token in stream:
        print(token, end="", flush=True)

asyncio.run(main())

Hardware Requirements

GMKtech M7 Specs:

CPU: AMD Ryzen 7 PRO 6850H (8 cores)
RAM: 28GB available (4GB to iGPU VRAM)
OS: Windows 11 + WSL2

Expected Performance:

Prompt processing: 50-100 tok/s
Generation: 15-25 tok/s (exceeds 10 TPS target)
Memory usage: ~4-5GB with Q4_K_M + turbo3

Configuration

Miniforge resolves configuration from three layers:

Hardware auto-detection
Optional performance preset
Environment variables or CLI overrides

Useful environment variables:

MINIFORGE_MODEL=MiniMaxAI/MiniMax-M2.7
MINIFORGE_BACKEND=llama_cpp
MINIFORGE_QUANTIZATION=Q4_K_M
MINIFORGE_PRESET=balanced
MINIFORGE_OFFLINE=1
MINIFORGE_MODEL_DIRS=/models;/mnt/d/AI
MINIFORGE_MODEL_WEIGHTS_PATH=/models/minimax/MiniMax-M2.7-Q4_K_M.gguf
MINIFORGE_MAX_TOKENS=1024
MINIFORGE_TEMPERATURE=0.7

Create ~/.config/miniforge/config.yaml:

max_memory_gb: 24.0
n_ctx: 200000
quantization: Q4_K_M
cache_type_k: turbo3
cache_type_v: turbo3
n_threads: 8
flash_attn: true

Or use the provided optimized config:

from miniforge.utils.config import M7Config

config = M7Config.from_yaml("configs/m7-optimized.yaml")
model = await Miniforge.from_pretrained(config=config)

CLI Runtime Inspection

miniforge doctor
miniforge doctor --preset memory --json
miniforge register local/minimax /models/MiniMax-M2.7-Q4_K_M.gguf --quantization Q4_K_M
miniforge serve --model local/minimax --offline
miniforge chat --preset balanced --system-prompt "You are a concise coding assistant."

Local Model Hosting

Miniforge can now resolve models without contacting Hugging Face:

miniforge register local/m2 /models/MiniMax-M2.7-Q4_K_M.gguf --quantization Q4_K_M
miniforge chat --model local/m2 --offline

Resolution order is: explicit model_weights_path, direct filesystem paths, registered hosted models, MINIFORGE_MODEL_DIRS, GGUF cache, then network download only when offline mode is disabled.

Examples

See examples/ directory:

basic_chat.py - Simple chat interface
streaming_chat.py - Real-time streaming
tool_agent.py - Tool calling with custom functions
vision_chat.py - Image understanding

Backends

llama.cpp (Recommended)

Fastest CPU inference with GGUF support:

model = await Miniforge.from_pretrained(
    "MiniMaxAI/MiniMax-M2.7",
    backend="llama_cpp",
    quantization="Q4_K_M",
)

If no prebuilt GGUF is on the Hub, Miniforge can convert SafeTensors weights automatically using a local llama.cpp checkout: install its Python requirements, build llama-quantize (for Q4_K_M etc.), then set MINIFORGE_LLAMA_CPP to the repo root (or llama_cpp_path in M7Config / YAML). The converter runs convert_hf_to_gguf.py and caches the result under your miniforge GGUF cache. If conversion is not configured or fails, the library falls back to the Transformers backend as before.

Transformers (Fallback)

Native HF support with bitsandbytes:

model = await Miniforge.from_pretrained(
    "MiniMaxAI/MiniMax-M2.7",
    backend="transformers",
)

Memory Optimization

The library automatically manages your 28GB constraint:

from miniforge.core.memory import MemoryManager

# Auto-select best quantization
mem = MemoryManager()
quant = mem.select_quantization(model_params=2.7)
# Returns: Q4_K_M (or Q3_K_M if memory constrained)

# Calculate safe context window
max_ctx = mem.calculate_max_context(
    model_quantized_gb=3.1,
    kv_cache_type="turbo3",
)
# Returns: up to 200000 (or less if memory-constrained)

Tool Calling

from miniforge.generation.tools import Tool

weather_tool = Tool(
    name="get_weather",
    description="Get weather for location",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"],
    },
    handler=get_weather_func,
)

response = await model.chat(
    "What's the weather in Paris?",
    tools=[weather_tool],
)

Vision

response = await model.chat_vision(
    message="Describe this image",
    image="path/to/image.jpg",
)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Apr 30, 2026

0.1.1

Apr 20, 2026

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miniforge-0.1.2.tar.gz (328.2 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

miniforge-0.1.2-py3-none-any.whl (75.8 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file miniforge-0.1.2.tar.gz.

File metadata

Download URL: miniforge-0.1.2.tar.gz
Upload date: Apr 30, 2026
Size: 328.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for miniforge-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`cf91eb44b1c0732f1cc6158b3dfeb04bff12150e2fa878550f157fbcfc06c509`
MD5	`30dc4e6b300da29d61c2d8c143ee235e`
BLAKE2b-256	`29e4a5ea117c8d94ecfe125049e3ead22d6dc4011b013804d90970245fbf0a27`

See more details on using hashes here.

File details

Details for the file miniforge-0.1.2-py3-none-any.whl.

File metadata

Download URL: miniforge-0.1.2-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 75.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for miniforge-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3584c9a152ef41e817fe861081d8854c84097b61d4a851475901777b5d6747a7`
MD5	`b9db8126899c9da7dd87776958e94597`
BLAKE2b-256	`1068ef2b4a8b39c4ac0825d9a05ca1dd1f70a0522e54046f67f7b6843e6359d5`

See more details on using hashes here.

miniforge 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Miniforge

Features

Quick Start

Hardware Requirements

Configuration

CLI Runtime Inspection

Local Model Hosting

Examples

Backends

llama.cpp (Recommended)

Transformers (Fallback)

Memory Optimization

Tool Calling

Vision

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes