Benchmark suite and community leaderboard for local LLM inference on Apple Silicon

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

mlx-Chronos ⏱️

Benchmark suite and community leaderboard for local LLM inference on Apple Silicon.
Run it. Share your results. Compare across hardware.

What is mlx-Chronos?

mlx-Chronos is a standardized benchmarking tool for local LLM inference engines on Apple Silicon. It automatically detects your hardware, runs a consistent set of tests across installed engines, and produces a structured JSON result you can contribute to the community leaderboard.

Supported engines:

Metrics measured:

TTFT — Time to First Token (cold and cached, with statistics)
tok/s — Generation throughput (mean, stddev, min, max across trials)
Engine RSS — Peak RSS of the engine server process during the benchmark when available
System RAM peak — Peak total Mac RAM in use during the benchmark
Tool calling — Success rate (coming in v0.2)

How It Works

When you run mlx-Chronos, it executes a fixed benchmark protocol against the running engine:

Cold TTFT — sends a prompt to the model and measures the time from request to first non-empty streamed token, including whitespace-only text tokens. Each trial uses a unique prompt to avoid cache hits.

Cached TTFT — sends the same fixed prompt on every cached trial. A priming call loads it into cache first, then cached trials run consecutively. This measures cache performance without interleaving unrelated prompts between cached measurements.

Throughput (tok/s) — measures tokens generated per second using a standard fixed prompt, identical across all engines and versions.

Peak engine RSS — measures the resident memory of the engine server process after warmup, through the recorded benchmark phases. This is intentionally not the total memory occupied by the loaded model or by macOS/Metal unified memory. It is meant to compare how light or heavy each engine process is while serving the same model. The default RSS sampling interval is 50ms and can be changed with --ram-sample-interval.

System RAM peak — continuously samples total Mac RAM usage from before warmup through the recorded benchmark phases and reports the observed peak in GB and percent. This is the metric to use when checking whether a run pushed the machine into memory pressure or swap while the model was actually loading or serving requests.

All metrics are run over multiple trials and reported with mean, stddev, min, and max. The default is 5 trials, with a maximum of 8 unique cold prompts. Results are saved as structured JSON in results/local/ by default. Copy a reviewed JSON into results/submitted/ only when you want to publish it to the community leaderboard.

Community Leaderboard

View the full leaderboard with all submitted results:

→ igurss.github.io/mlx-chronos

Quick Start

# Install
pip install mlx-chronos

# Check available engines
mlx-chronos engines

# Validate setup before a run
mlx-chronos validate --engine omlx --model "Qwen3.5-4B-OptiQ-4bit"

# Run benchmark (JSON by default)
mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit"

# Optional: write both JSON and Markdown outputs
mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --format all

# Optional: choose a custom output directory
mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --output-dir ~/Desktop/benchmarks

Note: the engine server must be running before you launch mlx-chronos. See CONTRIBUTING.md for setup instructions.

Contributing Your Results

Run mlx-chronos run on your Mac
A JSON file is generated in results/local/ (use --format all for a Markdown summary too)
Fork this repo and copy the JSON you want to publish into results/submitted/
GitHub Actions validates your result automatically
Once merged, the leaderboard updates

Leaderboard submissions must report throughput using the engine response's usage.completion_tokens. Local runs can still be saved with a fallback token estimate, but those results are not accepted for the public leaderboard.

See CONTRIBUTING.md for detailed instructions.

Benchmark Methodology

See docs/methodology.md for a full explanation of what is measured, how, and why.

Roadmap

Completed

Core benchmark runner with repeated trials, warmup, cache priming, and phase-separated metrics
Engine support for oMLX, Rapid-MLX, mlx-lm, and Ollama
Hardware detection for chip, machine model, memory, macOS, Python, architecture, and thermal state
Strict JSON schema validation with raw-trial consistency checks
Continuous engine RSS and system RAM peak sampling
Preflight validation for engine, server, and model access
GitHub Actions validation for submitted results
GitHub Pages leaderboard with engine/chip filters
JSON and Markdown result export
Published Apple M2 sample results refreshed with the current benchmark protocol

Add mlx-chronos submit to help prepare leaderboard submissions
Add warnings for battery mode, low power mode, and non-nominal thermal state
Improve leaderboard filtering by machine model and add broader column tooltips
Add integration tests against mock OpenAI-compatible servers

Future

Support larger trial counts with a bigger cold-prompt pool
Add p95 reporting for larger sample sizes
Evaluate a clearer TTFT naming model without breaking the v0.1 JSON contract
Add tool-calling success-rate benchmarks
Explore anti-spoofing checks for community submissions
Document external contributor branch workflow when community PRs start arriving
Collect more results from M3, M4, and M5 systems

License

Apache 2.0 — see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

igurss

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_chronos-0.1.0.tar.gz (33.4 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_chronos-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file mlx_chronos-0.1.0.tar.gz.

File metadata

Download URL: mlx_chronos-0.1.0.tar.gz
Upload date: May 29, 2026
Size: 33.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_chronos-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`502645697d567c396bc22677fec78649f3dd461dd9145dbdea8a5f60a13f13b4`
MD5	`6590cdd530c3468e47d09046ab726d6a`
BLAKE2b-256	`9ea24546d883c997fa4eaa7fb2ba1a5de2176c4f8af2e19de82e8f05ffa92957`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_chronos-0.1.0.tar.gz:

Publisher: release.yml on igurss/mlx-chronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_chronos-0.1.0.tar.gz
- Subject digest: 502645697d567c396bc22677fec78649f3dd461dd9145dbdea8a5f60a13f13b4
- Sigstore transparency entry: 1671385948
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: igurss/mlx-chronos@97ee058f2558f5e37d12d833d01addd23892b59a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/igurss
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97ee058f2558f5e37d12d833d01addd23892b59a
- Trigger Event: push

File details

Details for the file mlx_chronos-0.1.0-py3-none-any.whl.

File metadata

Download URL: mlx_chronos-0.1.0-py3-none-any.whl
Upload date: May 29, 2026
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_chronos-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`561f72d49a642f843f79aac99792009eb746a780af99297e107b717f531a4d40`
MD5	`067a2c2b5b2affd9f92649f0f8e54e7d`
BLAKE2b-256	`e5ae6a18637061b48f5100cd454235d7d7d0e0e80b574b1c967199ef010a40a3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_chronos-0.1.0-py3-none-any.whl:

Publisher: release.yml on igurss/mlx-chronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_chronos-0.1.0-py3-none-any.whl
- Subject digest: 561f72d49a642f843f79aac99792009eb746a780af99297e107b717f531a4d40
- Sigstore transparency entry: 1671386014
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: igurss/mlx-chronos@97ee058f2558f5e37d12d833d01addd23892b59a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/igurss
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97ee058f2558f5e37d12d833d01addd23892b59a
- Trigger Event: push

mlx-chronos 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mlx-Chronos ⏱️

What is mlx-Chronos?

How It Works

Community Leaderboard

Quick Start

Contributing Your Results

Benchmark Methodology

Roadmap

Completed

Next

Future

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance