Skip to main content

Qwen-focused MLX vision-language chat library with batched multimodal chat.

Project description

mlx-vlm-batch-outlines

mlx-vlm-batch-outlines is a small Qwen-focused MLX vision-language package for:

  • multimodal chat with images
  • batched multimodal chat
  • constrained decoding with llguidance
  • structured outputs from Pydantic, regex, CFG, or JSON Schema
  • simple video inference helpers

It is intentionally narrow. It is built for local MLX/Qwen workflows and does not try to support every mlx-vlm backend or the full Outlines API surface.

Attribution

This project heavily reuses and adapts ideas and code paths from:

In particular:

  • the MLX/Qwen multimodal runtime and model code are derived from mlx-vlm
  • the constrained decoding architecture and structured-output interface are derived from outlines

Scope

This package currently targets Qwen vision models exposed through MLX, such as:

  • mlx-community/Qwen3.5-4B-MLX-4bit
  • mlx-community/Qwen2-VL-2B-Instruct-4bit

The public API is:

  • load(...)
  • chat(...)
  • chat_stream(...)
  • batch_chat(...)
  • video_chat(...)
  • video_chunk_process(...)

Structured output helpers:

  • Regex
  • CFG
  • JsonSchema
  • regex(...)
  • cfg(...)
  • json_schema(...)

Install

uv sync

Or with plain pip:

pip install -e .

Quick Start

from PIL import Image
from mlx_vlm_batch_outlines import chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {"role": "system", "content": "Answer concisely."},
        {"role": "user", "content": ["Describe this image.", image]},
    ],
    max_tokens=80,
)

print(result.text)

Batched Image Chat

from PIL import Image
from mlx_vlm_batch_outlines import batch_chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")

results = batch_chat(
    model,
    processor,
    [
        [{"role": "user", "content": ["Describe this image.", cat]}],
        [{"role": "user", "content": ["Describe this image.", dog]}],
    ],
    max_tokens=80,
)

for text in results.texts:
    print(text)

Structured Outputs

Pydantic

from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import chat, load


class VisualSummary(BaseModel):
    primary_subject: list[str]
    subject_count: int
    setting: str
    short_description: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {
            "role": "user",
            "content": ["Describe this image as JSON using the requested schema.", image],
        }
    ],
    output_type=VisualSummary,
    max_tokens=140,
)

print(result.model_dump())

CFG

from PIL import Image
from mlx_vlm_batch_outlines import CFG, chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {
            "role": "user",
            "content": [
                "What animal is most prominent in this image? Choose either cat or dog.",
                image,
            ],
        }
    ],
    output_type=CFG('start: "cat" | "dog"'),
    max_tokens=12,
)

print(result.text)

Batch + Pydantic

from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import batch_chat, load


class VisualSummary(BaseModel):
    primary_subject: list[str]
    subject_count: int
    setting: str
    short_description: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")

results = batch_chat(
    model,
    processor,
    [
        [{"role": "user", "content": ["Describe this image.", cat]}],
        [{"role": "user", "content": ["Describe this image.", dog]}],
    ],
    output_type=VisualSummary,
    max_tokens=180,
)

for item in results:
    print(item.model_dump())

Video Chat

video_chat(...) is the thin direct video path. It samples frames across the whole video and runs one multimodal generation call.

from mlx_vlm_batch_outlines import load, video_chat

model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-4bit")

result = video_chat(
    model,
    processor,
    video="path/to/video.mp4",
    prompt="Describe this video.",
    fps=1.0,
    max_pixels=(224, 224),
    max_tokens=100,
)

print(result.text)

Chunked Video Processing

video_chunk_process(...) treats a single video as many independent image-chat chunks.

For each chunk it:

  1. slices the video by time
  2. samples frames inside that chunk
  3. converts those frames into a normal multi-image chat
  4. runs chunk chats through batch_chat(...) in mini-batches

This is useful when you want independent chunk summaries instead of one global video answer.

from pydantic import BaseModel
from mlx_vlm_batch_outlines import load, video_chunk_process


class ChunkSummary(BaseModel):
    actions: list[str]
    scene: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")

results = video_chunk_process(
    model,
    processor,
    video="path/to/video.mp4",
    prompt="Describe what is happening in this chunk.",
    chunk_length_seconds=5.0,
    batch_size=4,
    fps=1.0,
    max_frames_per_chunk=8,
    output_type=ChunkSummary,
    max_tokens=120,
)

for item in results:
    print(item["chunk_index"], item["start_sec"], item["end_sec"], item["output"])

Each returned item looks like:

{
    "chunk_index": 0,
    "start_sec": 0.0,
    "end_sec": 5.0,
    "output": ...,
}

If output_type is structured, output is the parsed structured object. Otherwise it is the raw text string for that chunk.

Notes

  • Default image resize is currently 224x224.
  • Smaller image sizes can improve batching throughput significantly because image token count drops quickly with resolution.
  • Homogeneous batches usually perform better than mixed multimodal shapes.
  • batch_chat_stream(...) is not implemented.
  • This package is Qwen-focused and not intended as a generic VLM abstraction layer.

Benchmark Notes

These are local measurements from the development machine, not a formal benchmark suite.

  • 4B Qwen, homogeneous 3 x 1-image batch, 768x768:

    • sequential: about 19.25s
    • batch: about 15.63s
    • speedup: about 1.23x
  • 4B Qwen, homogeneous 3 x 1-image batch, 384x384, short CFG output:

    • sequential: about 6.25s
    • batch: about 2.94s
    • speedup: about 2.13x

The practical takeaway is simple:

  • batching helps more when image sizes are smaller
  • batching helps more when the workload is homogeneous
  • image resolution matters a lot because image token count grows quickly with width and height

Local Verification

There is a simple verifier script in:

Run it with:

uv run python verify_mlx_vlm_batch_outlines.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_vlm_batch_outlines-0.1.0.tar.gz (125.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl (134.6 kB view details)

Uploaded Python 3

File details

Details for the file mlx_vlm_batch_outlines-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_vlm_batch_outlines-0.1.0.tar.gz
  • Upload date:
  • Size: 125.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_vlm_batch_outlines-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0d5821b295d5f375a41f28c2a25700f4956d0b242b17d7e219d3efcbeeadd658
MD5 ffe238862e31f984410863be0ce4f98a
BLAKE2b-256 e9015d1a38ae5f508d5bbdb210d176a33f343500ea119555e0474cdd883f7c97

See more details on using hashes here.

File details

Details for the file mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 134.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f0e1b15b5e259add7e47a94baba79f8b6c453bc0b49ceb05cd4da78ff067093
MD5 796c3a202714ba42a17f6f63abcbb218
BLAKE2b-256 d14ba30b7f1a5d0ad39766f560a77961f0fe849364bfd9072f015912ff470a16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page