mlx-vlm-batch-outlines

Qwen-focused MLX vision-language chat library with batched multimodal chat.

These details have not been verified by PyPI

Project description

mlx-vlm-batch-outlines

mlx-vlm-batch-outlines is a small Qwen-focused MLX vision-language package for:

multimodal chat with images
batched multimodal chat
constrained decoding with llguidance
structured outputs from Pydantic, regex, CFG, or JSON Schema
simple video inference helpers

It is intentionally narrow. It is built for local MLX/Qwen workflows and does not try to support every mlx-vlm backend or the full Outlines API surface.

Attribution

This project heavily reuses and adapts ideas and code paths from:

In particular:

the MLX/Qwen multimodal runtime and model code are derived from mlx-vlm
the constrained decoding architecture and structured-output interface are derived from outlines

Scope

This package currently targets Qwen vision models exposed through MLX, such as:

mlx-community/Qwen3.5-4B-MLX-4bit
mlx-community/Qwen2-VL-2B-Instruct-4bit

The public API is:

load(...)
chat(...)
chat_stream(...)
batch_chat(...)
video_chat(...)
video_chunk_process(...)

Structured output helpers:

Regex
CFG
JsonSchema
regex(...)
cfg(...)
json_schema(...)

Install

uv sync

Or with plain pip:

pip install -e .

Quick Start

from PIL import Image
from mlx_vlm_batch_outlines import chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {"role": "system", "content": "Answer concisely."},
        {"role": "user", "content": ["Describe this image.", image]},
    ],
    max_tokens=80,
)

print(result.text)

Batched Image Chat

from PIL import Image
from mlx_vlm_batch_outlines import batch_chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")

results = batch_chat(
    model,
    processor,
    [
        [{"role": "user", "content": ["Describe this image.", cat]}],
        [{"role": "user", "content": ["Describe this image.", dog]}],
    ],
    max_tokens=80,
)

for text in results.texts:
    print(text)

Structured Outputs

Pydantic

from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import chat, load


class VisualSummary(BaseModel):
    primary_subject: list[str]
    subject_count: int
    setting: str
    short_description: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {
            "role": "user",
            "content": ["Describe this image as JSON using the requested schema.", image],
        }
    ],
    output_type=VisualSummary,
    max_tokens=140,
)

print(result.model_dump())

CFG

from PIL import Image
from mlx_vlm_batch_outlines import CFG, chat, load

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")

result = chat(
    model,
    processor,
    [
        {
            "role": "user",
            "content": [
                "What animal is most prominent in this image? Choose either cat or dog.",
                image,
            ],
        }
    ],
    output_type=CFG('start: "cat" | "dog"'),
    max_tokens=12,
)

print(result.text)

Batch + Pydantic

from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import batch_chat, load


class VisualSummary(BaseModel):
    primary_subject: list[str]
    subject_count: int
    setting: str
    short_description: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")

results = batch_chat(
    model,
    processor,
    [
        [{"role": "user", "content": ["Describe this image.", cat]}],
        [{"role": "user", "content": ["Describe this image.", dog]}],
    ],
    output_type=VisualSummary,
    max_tokens=180,
)

for item in results:
    print(item.model_dump())

Video Chat

video_chat(...) is the thin direct video path. It samples frames across the whole video and runs one multimodal generation call.

from mlx_vlm_batch_outlines import load, video_chat

model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-4bit")

result = video_chat(
    model,
    processor,
    video="path/to/video.mp4",
    prompt="Describe this video.",
    fps=1.0,
    max_pixels=(224, 224),
    max_tokens=100,
)

print(result.text)

Chunked Video Processing

video_chunk_process(...) treats a single video as many independent image-chat chunks.

For each chunk it:

slices the video by time
samples frames inside that chunk
converts those frames into a normal multi-image chat
runs chunk chats through batch_chat(...) in mini-batches

This is useful when you want independent chunk summaries instead of one global video answer.

from pydantic import BaseModel
from mlx_vlm_batch_outlines import load, video_chunk_process


class ChunkSummary(BaseModel):
    actions: list[str]
    scene: str


model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")

results = video_chunk_process(
    model,
    processor,
    video="path/to/video.mp4",
    prompt="Describe what is happening in this chunk.",
    chunk_length_seconds=5.0,
    batch_size=4,
    fps=1.0,
    max_frames_per_chunk=8,
    output_type=ChunkSummary,
    max_tokens=120,
)

for item in results:
    print(item["chunk_index"], item["start_sec"], item["end_sec"], item["output"])

Each returned item looks like:

{
    "chunk_index": 0,
    "start_sec": 0.0,
    "end_sec": 5.0,
    "output": ...,
}

If output_type is structured, output is the parsed structured object. Otherwise it is the raw text string for that chunk.

Notes

Default image resize is currently 224x224.
Smaller image sizes can improve batching throughput significantly because image token count drops quickly with resolution.
Homogeneous batches usually perform better than mixed multimodal shapes.
batch_chat_stream(...) is not implemented.
This package is Qwen-focused and not intended as a generic VLM abstraction layer.

Benchmark Notes

These are local measurements from the development machine, not a formal benchmark suite.

4B Qwen, homogeneous 3 x 1-image batch, 768x768:
- sequential: about 19.25s
- batch: about 15.63s
- speedup: about 1.23x
4B Qwen, homogeneous 3 x 1-image batch, 384x384, short CFG output:
- sequential: about 6.25s
- batch: about 2.94s
- speedup: about 2.13x

The practical takeaway is simple:

batching helps more when image sizes are smaller
batching helps more when the workload is homogeneous
image resolution matters a lot because image token count grows quickly with width and height

Local Verification

There is a simple verifier script in:

verify_mlx_vlm_batch_outlines.py

Run it with:

uv run python verify_mlx_vlm_batch_outlines.py

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_vlm_batch_outlines-0.1.0.tar.gz (125.3 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl (134.6 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file mlx_vlm_batch_outlines-0.1.0.tar.gz.

File metadata

Download URL: mlx_vlm_batch_outlines-0.1.0.tar.gz
Upload date: Apr 10, 2026
Size: 125.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_vlm_batch_outlines-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0d5821b295d5f375a41f28c2a25700f4956d0b242b17d7e219d3efcbeeadd658`
MD5	`ffe238862e31f984410863be0ce4f98a`
BLAKE2b-256	`e9015d1a38ae5f508d5bbdb210d176a33f343500ea119555e0474cdd883f7c97`

See more details on using hashes here.

File details

Details for the file mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl.

File metadata

Download URL: mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 134.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f0e1b15b5e259add7e47a94baba79f8b6c453bc0b49ceb05cd4da78ff067093`
MD5	`796c3a202714ba42a17f6f63abcbb218`
BLAKE2b-256	`d14ba30b7f1a5d0ad39766f560a77961f0fe849364bfd9072f015912ff470a16`

See more details on using hashes here.

mlx-vlm-batch-outlines 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

mlx-vlm-batch-outlines

Attribution

Scope

Install

Quick Start

Batched Image Chat

Structured Outputs

Pydantic

CFG

Batch + Pydantic

Video Chat

Chunked Video Processing

Notes

Benchmark Notes

Local Verification

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes