Local VLM inference engine for video — Apple Silicon, NVIDIA, and CPU

These details have not been verified by PyPI

Project links

Project description

TrioCore

Real-time Vision Intelligence Engine for Apple Silicon

YOLO object detection + VLM scene understanding. One pip install, zero Docker.

What is TrioCore?

Point it at any image, video, or camera and it will detect objects, count people, and describe scenes — all running locally on your Mac, no cloud APIs needed.

Core capabilities:

Detect — Find and count objects (people, cars, etc.) in images
Describe — Get natural language descriptions of what's happening in a scene
Crop-Describe — Detect objects, then describe each one individually
REST API — Built-in web server on port 8100 with interactive docs
CLI — Simple commands: trio serve, trio analyze, trio webcam

New to computer vision? Key terms explained

Term	What it means
YOLO	"You Only Look Once" — a fast object detection model that finds and labels objects in images
VLM	Vision Language Model — an AI model that can look at an image and describe it in natural language
MLX	Apple's machine learning framework, optimized for M1/M2/M3/M4 chips
ONNX	A standard format for ML models that runs on any hardware
ToMe	Token Merging — a technique that makes VLM inference faster by reducing redundant data
KV cache	A memory optimization that speeds up processing of sequential video frames

Quick Start

# 1. Install (Apple Silicon Mac recommended)
pip install 'trio-core[mlx]'

# 2. Check your setup
trio doctor

# 3. Start the server
trio serve

First run note: The first time you run trio serve or trio analyze, the model will be downloaded automatically (~2 GB for the default 3B model). This takes 5-20 minutes depending on your connection. Subsequent runs start instantly.

Once the server is running, open http://localhost:8100/docs in your browser to explore the API interactively, or try it from the terminal:

# In another terminal — grab any image and detect objects in it
# macOS:
curl -X POST http://localhost:8100/api/inference/detect \
  -H "Content-Type: application/json" \
  -d '{"image_b64": "'$(base64 -i your-photo.jpg)'"}'

# Linux:
curl -X POST http://localhost:8100/api/inference/detect \
  -H "Content-Type: application/json" \
  -d '{"image_b64": "'$(base64 -w0 your-photo.jpg)'"}'

{
  "people_count": 3,
  "vehicle_count": 1,
  "by_class": {"person": 3, "car": 1},
  "crops_b64": [{"class": "person", "bbox": [100, 50, 200, 300], "confidence": 0.92}],
  "elapsed_ms": 45
}

Or analyze an image directly from the CLI (no server needed):

trio analyze your-photo.jpg -q "How many people are in this image?"

See more in examples/ — quickstart.py (5 lines) and api_client.py (full API usage).

Install

Requires Python 3.10+.

# Apple Silicon Mac (M1/M2/M3/M4) — recommended, uses Apple's MLX framework
pip install 'trio-core[mlx]'

# Apple Silicon + webcam monitoring
pip install 'trio-core[mlx,webcam]'

# NVIDIA GPU or CPU-only (uses PyTorch/Transformers instead of MLX)
pip install 'trio-core[transformers]'

# For IP/RTSP camera support (macOS)
brew install ffmpeg

Which install do I pick? If you have a Mac with Apple Silicon (2020 or later), use [mlx]. If you have an NVIDIA GPU or are on Linux, use [transformers]. Not sure? Run trio device after install to see what hardware was detected.

API Reference

Tip: Once the server is running, visit http://localhost:8100/docs for interactive API documentation where you can try every endpoint from your browser.

Start the server:

trio serve                          # default: 0.0.0.0:8100
trio serve --port 9000              # custom port
TRIO_API_KEY=secret trio serve      # enable Bearer token auth

`POST /api/inference/detect`

Run YOLO object detection. Returns counts and bounding boxes.

curl -X POST http://localhost:8100/api/inference/detect \
  -H "Content-Type: application/json" \
  -d '{"image_b64": "<base64 jpeg>", "pad_ratio": 0.15}'

Response:

{
  "people_count": 2,
  "vehicle_count": 1,
  "by_class": {"person": 2, "car": 1},
  "crops_b64": [
    {"class": "person", "bbox": [100, 50, 200, 300], "confidence": 0.92},
    {"class": "car", "bbox": [400, 200, 600, 350], "confidence": 0.87}
  ],
  "elapsed_ms": 42
}

`POST /api/inference/describe`

Run VLM on an image. Returns natural language description.

curl -X POST http://localhost:8100/api/inference/describe \
  -H "Content-Type: application/json" \
  -d '{"image_b64": "<base64 jpeg>", "prompt": "Describe what you see."}'

Response:

{
  "description": "A woman in a red jacket is walking a golden retriever along a tree-lined sidewalk.",
  "elapsed_ms": 380
}

`POST /api/inference/crop-describe`

Combined pipeline: YOLO detects objects, crops them, then VLM describes each entity individually before generating a full scene description.

curl -X POST http://localhost:8100/api/inference/crop-describe \
  -H "Content-Type: application/json" \
  -d '{
    "image_b64": "<base64 jpeg>",
    "crops": [
      {"class": "person", "bbox": [100, 50, 200, 300], "confidence": 0.92}
    ],
    "max_crops": 3
  }'

Response:

{
  "description": "1 person: male 30s, blue polo, carrying laptop bag",
  "entities": {"persons": [...], "vehicles": [...]},
  "crop_descriptions": ["person: male 30s, blue polo, carrying laptop bag"],
  "elapsed_ms": 520
}

`GET /api/inference/status`

Check which models are loaded.

`GET /health`

Health check with uptime.

CLI

trio doctor                             # Check setup — run this first!
trio device                             # Show your hardware + recommended model
trio serve                              # Start inference API server (port 8100)
trio analyze photo.jpg -q "What's here?" # Analyze an image (no server needed)
trio analyze video.mp4 -q "Describe"    # Video analysis
trio webcam -w "a person is waving"     # Live webcam monitor with alerts
trio cam --host 192.168.1.100 -p pass   # IP camera monitor
trio bench video.mp4 -n 5              # Benchmark inference speed

`trio analyze`

trio analyze photo.jpg -q "How many people are in this image?"
trio analyze video.mp4 -q "Describe the scene" --json    # JSON output with metrics
trio analyze photo.jpg -m mlx-community/Qwen2.5-VL-7B-Instruct-4bit  # specific model

`trio webcam`

Live camera monitor with VLM-based alerting. Green = clear, red = alert with audio.

trio webcam -w "someone at the door"         # Built-in webcam
trio webcam -s 1 -w "package on doorstep"    # iPhone Continuity Camera
trio webcam --count                          # Count objects (cumulative)

Python SDK

from trio_core import TrioCore, EngineConfig

# Load with defaults (auto-selects best model for your hardware)
engine = TrioCore()
engine.load()

# Analyze an image or video
result = engine.analyze_video("photo.jpg", "What do you see?")
print(result.text)
print(f"{result.metrics.latency_ms:.0f}ms | {result.metrics.tokens_per_sec:.0f} tok/s")

Configuration

config = EngineConfig(
    model="mlx-community/Qwen2.5-VL-3B-Instruct-4bit",
    tome_enabled=True,       # Token Merging — 73% fewer visual tokens
    tome_r=4,
)
engine = TrioCore(config)

Or via environment variables:

TRIO_MODEL=mlx-community/Qwen2.5-VL-3B-Instruct-4bit
TRIO_TOME_ENABLED=true
TRIO_TOME_R=4

Supported Models

Tier 1 — Full optimization (native loading + visual token compression + KV reuse)

Model	Params	4-bit VRAM	ToMe	Compressed	KV Reuse
Qwen2.5-VL	3B, 7B	1.8-4.5G	yes	yes	yes
Qwen3-VL	2B, 4B, 8B	1.5-5.0G	--	yes	yes
Qwen3.5	0.8-9B	0.5-5.0G	yes	yes	yes
InternVL3	1B, 2B	1.0-1.6G	--	yes	yes

Tier 2 — Inference only (via mlx-vlm)

Gemma 3n, SmolVLM2, Phi-4, FastVLM, and any model supported by mlx-vlm.

Benchmarks

All benchmarks on Apple M3 Ultra, 4-bit quantized models. Accuracy is hardware-independent.

Inference Latency (POPE benchmark, ms/sample)

Model	Params	Baseline	Compressed 50%	Speedup
Qwen3.5-0.8B	0.8B	148ms	135ms	1.09x
Qwen3.5-2B	2B	251ms	221ms	1.14x
Qwen3-VL-2B	2B	275ms	223ms	1.23x
Qwen2.5-VL-3B	3B	354ms	279ms	1.27x
Qwen2.5-VL-7B	7B	522ms	384ms	1.36x
Qwen3-VL-8B	8B	633ms	503ms	1.26x

Frame-to-Frame KV Cache Reuse

Model	Speedup	Method
Qwen3-VL-4B	1.71x	KV cache reuse
Qwen2.5-VL-3B	1.57x	KV cache reuse
Qwen3.5-0.8B	1.35x	DeltaNet state snapshot

Overhead vs raw mlx-vlm

Metric	mlx-vlm	trio-core	Delta
Prefill	1018ms	1016ms	-0.2%
Decode	524ms	513ms	-2.1%
Output	--	bit-identical	--

Full accuracy benchmarks (11 models x 6 benchmarks)

POPE — Object Hallucination (100 samples)

Model	Baseline	Compressed 50%
InternVL3-2B	95%	94%
Qwen2.5-VL-3B	94%	75%
Qwen3.5-2B	94%	93%
Qwen3-VL-8B	91%	93%

TextVQA — OCR Reading (50 samples)

Model	Baseline	Compressed 50%
Qwen3.5-2B	80%	74%
InternVL3-2B	78%	72%
Qwen3-VL-2B	76%	76%

GQA — Visual Reasoning (50 samples)

Model	Baseline	Compressed 50%
Qwen3.5-2B	68%	68%
InternVL3-2B	66%	66%
Qwen3.5-4B	58%	64%

MMBench — Multi-ability (50 samples)

Model	Baseline	Compressed 50%
InternVL3-2B	98%	96%
Qwen2.5-VL-7B	96%	94%
Qwen3.5-9B	96%	96%

SurveillanceVQA — Anomaly Detection (1,827 samples)

Model	Accuracy	F1	Recall
Qwen2.5-VL-7B	70.1%	0.362	25.3%
Qwen3-VL-8B	69.0%	0.395	30.2%
Qwen3.5-4B	65.2%	0.556	65.1%

Architecture

                           TrioCore
                              |
              +---------------+---------------+
              |                               |
         YOLO Pipeline                   VLM Pipeline
              |                               |
    YOLOv10n ONNX model              Qwen/InternVL (MLX)
    tiled 2x2 detection              native model loading
    ByteTrack tracking               ToMe token compression
              |                       KV cache reuse
              |                               |
              +---------------+---------------+
                              |
                    FastAPI Server (:8100)
                              |
              +-------+-------+-------+
              |       |       |       |
          /detect  /describe  /crop   /status
                              -describe

Key design decisions

No ultralytics — YOLOv10 loaded via ONNX Runtime (MIT license)
Native VLM loading — Vendored model code (~3600 lines), bit-identical with mlx-vlm, zero overhead
Visual token compression — ToMe merges similar visual tokens in the ViT, reducing prefill by up to 73%
KV cache reuse — For sequential frames, reuse KV cache from previous frame (1.7x speedup)
Lazy loading — Models loaded on first request, not at server start

Configuration

All settings via environment variables or EngineConfig:

Variable	Default	Description
`TRIO_MODEL`	Auto-detected	HuggingFace model ID
`TRIO_TOME_ENABLED`	`false`	Enable Token Merging
`TRIO_TOME_R`	`4`	Tokens merged per ViT block
`TRIO_COMPRESS_ENABLED`	`false`	Enable visual token compression
`TRIO_COMPRESS_RATIO`	`0.5`	Compression ratio
`TRIO_API_KEY`	(none)	Bearer token for API auth
`TRIO_YOLO_MODEL`	(bundled)	Path to YOLO ONNX model

See src/trio_core/config.py for all options.

OpenClaw Integration

TrioCore can connect to an OpenClaw Gateway as a node for remote camera monitoring via WebSocket.

pip install 'trio-core[claw]'
trio claw --pair -g ws://gateway:18789 --token <secret>
trio claw -g ws://gateway:18789 -c "rtsp://admin:pass@camera/stream"

Troubleshooting

Problem	Solution
`trio serve` hangs on first run	It's downloading the model (~2 GB). Wait for it to finish. Check progress with `ls -la ~/.cache/huggingface/`
`ModuleNotFoundError: mlx`	You installed without the `[mlx]` extra. Run `pip install 'trio-core[mlx]'`
Server starts but curl returns errors	Make sure you're using port 8100 (not 8000). Check with `curl http://localhost:8100/health`
`trio analyze` says "no model found"	Run `trio doctor` to check your setup and see which models are available
Out of memory on large images	Try a smaller model: `trio serve` defaults to a 3B model (~2 GB RAM). The 7B model needs ~5 GB
Webcam not detected	On macOS, grant Terminal camera access in System Settings > Privacy > Camera

Run trio doctor to diagnose most issues — it checks Python version, dependencies, hardware, and available models.

References

ToMe — Bolya et al., "Token Merging: Your ViT But Faster", ICLR 2023. arXiv:2210.09461
StreamMem — Du et al., "Streaming KV Cache Management for Video Understanding", 2025. arXiv:2504.08498
SurveillanceVQA-589K — Zheng et al., 2025. arXiv:2505.12589

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.5

May 18, 2026

1.1.4

Apr 6, 2026

1.1.3

Apr 3, 2026

1.1.2

Apr 3, 2026

1.1.1

Apr 3, 2026

1.1.0

Apr 2, 2026

This version

0.9.0

Mar 27, 2026

0.8.3

Mar 10, 2026

0.8.2

Mar 10, 2026

0.8.1

Mar 10, 2026

0.8.0

Mar 10, 2026

0.7.1

Mar 9, 2026

0.7.0

Mar 9, 2026

0.6.0

Mar 9, 2026

0.4.4

Mar 9, 2026

0.4.3

Mar 9, 2026

0.4.2

Mar 9, 2026

0.4.1

Mar 8, 2026

0.4.0

Mar 8, 2026

0.3.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trio_core-0.9.0.tar.gz (18.6 MB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trio_core-0.9.0-py3-none-any.whl (186.2 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file trio_core-0.9.0.tar.gz.

File metadata

Download URL: trio_core-0.9.0.tar.gz
Upload date: Mar 27, 2026
Size: 18.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trio_core-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`c67b96dab59504c488d61c541c5ff8fe711681e6e3a2d916e85b2f796ce55489`
MD5	`3726addb56926c5e8e0c8277b82636c6`
BLAKE2b-256	`da2139dd3081434374c6ea7c97c73d0e93b542d68981f67116d63f9ff77aad9a`

See more details on using hashes here.

File details

Details for the file trio_core-0.9.0-py3-none-any.whl.

File metadata

Download URL: trio_core-0.9.0-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 186.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trio_core-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bef2f9e825f0fa937ae9794e7ce477d8f8f9095112b6c672af1eb0d13ba154cd`
MD5	`7fef0c24b88949f8c3e7badce4748683`
BLAKE2b-256	`a782a7c8c9f6def38796dc64eb29a09fbb5a458940b54e7c049df9d266d1398a`

See more details on using hashes here.

trio-core 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrioCore

What is TrioCore?

Quick Start

Install

API Reference

POST /api/inference/detect

POST /api/inference/describe

POST /api/inference/crop-describe

GET /api/inference/status

GET /health

CLI

trio analyze

trio webcam

Python SDK

Configuration

Supported Models

Tier 1 — Full optimization (native loading + visual token compression + KV reuse)

Tier 2 — Inference only (via mlx-vlm)

Benchmarks

Inference Latency (POPE benchmark, ms/sample)

Frame-to-Frame KV Cache Reuse

Overhead vs raw mlx-vlm

POPE — Object Hallucination (100 samples)

TextVQA — OCR Reading (50 samples)

GQA — Visual Reasoning (50 samples)

MMBench — Multi-ability (50 samples)

SurveillanceVQA — Anomaly Detection (1,827 samples)

Architecture

Key design decisions

Configuration

OpenClaw Integration

Troubleshooting

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /api/inference/detect`

`POST /api/inference/describe`

`POST /api/inference/crop-describe`

`GET /api/inference/status`

`GET /health`

`trio analyze`

`trio webcam`