Qwen-focused MLX vision-language chat library with batched multimodal chat.
Project description
mlx-vlm-batch-outlines
mlx-vlm-batch-outlines is a small Qwen-focused MLX vision-language package for:
- multimodal chat with images
- batched multimodal chat
- constrained decoding with
llguidance - structured outputs from Pydantic, regex, CFG, or JSON Schema
- simple video inference helpers
It is intentionally narrow. It is built for local MLX/Qwen workflows and does not try to support every mlx-vlm backend or the full Outlines API surface.
Attribution
This project heavily reuses and adapts ideas and code paths from:
In particular:
- the MLX/Qwen multimodal runtime and model code are derived from
mlx-vlm - the constrained decoding architecture and structured-output interface are derived from
outlines
Scope
This package currently targets Qwen vision models exposed through MLX, such as:
mlx-community/Qwen3.5-4B-MLX-4bitmlx-community/Qwen2-VL-2B-Instruct-4bit
The public API is:
load(...)chat(...)chat_stream(...)batch_chat(...)video_chat(...)video_chunk_process(...)
Structured output helpers:
RegexCFGJsonSchemaregex(...)cfg(...)json_schema(...)
Install
uv sync
Or with plain pip:
pip install -e .
Quick Start
from PIL import Image
from mlx_vlm_batch_outlines import chat, load
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")
result = chat(
model,
processor,
[
{"role": "system", "content": "Answer concisely."},
{"role": "user", "content": ["Describe this image.", image]},
],
max_tokens=80,
)
print(result.text)
Batched Image Chat
from PIL import Image
from mlx_vlm_batch_outlines import batch_chat, load
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")
results = batch_chat(
model,
processor,
[
[{"role": "user", "content": ["Describe this image.", cat]}],
[{"role": "user", "content": ["Describe this image.", dog]}],
],
max_tokens=80,
)
for text in results.texts:
print(text)
Structured Outputs
Pydantic
from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import chat, load
class VisualSummary(BaseModel):
primary_subject: list[str]
subject_count: int
setting: str
short_description: str
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")
result = chat(
model,
processor,
[
{
"role": "user",
"content": ["Describe this image as JSON using the requested schema.", image],
}
],
output_type=VisualSummary,
max_tokens=140,
)
print(result.model_dump())
CFG
from PIL import Image
from mlx_vlm_batch_outlines import CFG, chat, load
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
image = Image.open("cat.jpeg")
result = chat(
model,
processor,
[
{
"role": "user",
"content": [
"What animal is most prominent in this image? Choose either cat or dog.",
image,
],
}
],
output_type=CFG('start: "cat" | "dog"'),
max_tokens=12,
)
print(result.text)
Batch + Pydantic
from PIL import Image
from pydantic import BaseModel
from mlx_vlm_batch_outlines import batch_chat, load
class VisualSummary(BaseModel):
primary_subject: list[str]
subject_count: int
setting: str
short_description: str
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
cat = Image.open("cat.jpeg")
dog = Image.open("dog.jpeg")
results = batch_chat(
model,
processor,
[
[{"role": "user", "content": ["Describe this image.", cat]}],
[{"role": "user", "content": ["Describe this image.", dog]}],
],
output_type=VisualSummary,
max_tokens=180,
)
for item in results:
print(item.model_dump())
Video Chat
video_chat(...) is the thin direct video path. It samples frames across the whole video and runs one multimodal generation call.
from mlx_vlm_batch_outlines import load, video_chat
model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-4bit")
result = video_chat(
model,
processor,
video="path/to/video.mp4",
prompt="Describe this video.",
fps=1.0,
max_pixels=(224, 224),
max_tokens=100,
)
print(result.text)
Chunked Video Processing
video_chunk_process(...) treats a single video as many independent image-chat chunks.
For each chunk it:
- slices the video by time
- samples frames inside that chunk
- converts those frames into a normal multi-image chat
- runs chunk chats through
batch_chat(...)in mini-batches
This is useful when you want independent chunk summaries instead of one global video answer.
from pydantic import BaseModel
from mlx_vlm_batch_outlines import load, video_chunk_process
class ChunkSummary(BaseModel):
actions: list[str]
scene: str
model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")
results = video_chunk_process(
model,
processor,
video="path/to/video.mp4",
prompt="Describe what is happening in this chunk.",
chunk_length_seconds=5.0,
batch_size=4,
fps=1.0,
max_frames_per_chunk=8,
output_type=ChunkSummary,
max_tokens=120,
)
for item in results:
print(item["chunk_index"], item["start_sec"], item["end_sec"], item["output"])
Each returned item looks like:
{
"chunk_index": 0,
"start_sec": 0.0,
"end_sec": 5.0,
"output": ...,
}
If output_type is structured, output is the parsed structured object. Otherwise it is the raw text string for that chunk.
Notes
- Default image resize is currently
224x224. - Smaller image sizes can improve batching throughput significantly because image token count drops quickly with resolution.
- Homogeneous batches usually perform better than mixed multimodal shapes.
batch_chat_stream(...)is not implemented.- This package is Qwen-focused and not intended as a generic VLM abstraction layer.
Benchmark Notes
These are local measurements from the development machine, not a formal benchmark suite.
-
4B Qwen, homogeneous
3 x 1-imagebatch,768x768:- sequential: about
19.25s - batch: about
15.63s - speedup: about
1.23x
- sequential: about
-
4B Qwen, homogeneous
3 x 1-imagebatch,384x384, short CFG output:- sequential: about
6.25s - batch: about
2.94s - speedup: about
2.13x
- sequential: about
The practical takeaway is simple:
- batching helps more when image sizes are smaller
- batching helps more when the workload is homogeneous
- image resolution matters a lot because image token count grows quickly with width and height
Local Verification
There is a simple verifier script in:
Run it with:
uv run python verify_mlx_vlm_batch_outlines.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_vlm_batch_outlines-0.1.0.tar.gz.
File metadata
- Download URL: mlx_vlm_batch_outlines-0.1.0.tar.gz
- Upload date:
- Size: 125.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d5821b295d5f375a41f28c2a25700f4956d0b242b17d7e219d3efcbeeadd658
|
|
| MD5 |
ffe238862e31f984410863be0ce4f98a
|
|
| BLAKE2b-256 |
e9015d1a38ae5f508d5bbdb210d176a33f343500ea119555e0474cdd883f7c97
|
File details
Details for the file mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlx_vlm_batch_outlines-0.1.0-py3-none-any.whl
- Upload date:
- Size: 134.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f0e1b15b5e259add7e47a94baba79f8b6c453bc0b49ceb05cd4da78ff067093
|
|
| MD5 |
796c3a202714ba42a17f6f63abcbb218
|
|
| BLAKE2b-256 |
d14ba30b7f1a5d0ad39766f560a77961f0fe849364bfd9072f015912ff470a16
|