Minimal OpenAI-compatible server for GPT-OSS models on Apple Silicon with MLX

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

MLX GPT-OSS Server

Minimal OpenAI-compatible server for GPT-OSS/Harmony models on Apple Silicon.
Built with mlx-lm (inference), openai-harmony (prompt formatting), and FastAPI (HTTP API).

Feature List

OpenAI-style /v1/chat/completions endpoint
OpenAI-style /v1/responses endpoint
Streaming (SSE) and non-streaming responses
Harmony reasoning_effort support (low, medium, high)
OpenAI tool-calling response format
Responses API function-calling and previous_response_id support
Robust Harmony tool-calling parser and stream recovery paths
Usage token counts in responses
/health queue stats and /v1/models compatibility endpoint
Single-model runtime with FIFO request queueing

Requirements

macOS on Apple Silicon
Python >=3.11

Quick Start

pip install mlx-gpt-oss
mlx-gpt-oss --model mlx-community/gpt-oss-20b-MXFP4-Q8

Default bind: http://0.0.0.0:8000

Install From Source

python3 -m venv .venv
source .venv/bin/activate
pip install -e .
mlx-gpt-oss --model mlx-community/gpt-oss-20b-MXFP4-Q8

API Endpoints

Endpoint	Method	Purpose
`/health`	`GET`	Server health + active/queued request counts
`/v1/models`	`GET`	Loaded model metadata
`/v1/chat/completions`	`POST`	OpenAI-compatible chat completion
`/v1/responses`	`POST`	OpenAI-compatible Responses API create
`/v1/responses/{response_id}`	`GET`	Retrieve stored response
`/v1/responses/{response_id}`	`DELETE`	Delete stored response
`/v1/responses/{response_id}/input_items`	`GET`	Retrieve stored request input items

Chat Completions Notes

model is required for compatibility, but the server always uses the single model loaded at startup.
Supports OpenAI-style messages, stream, tools, tool_choice, stop, and common sampling params.
top_k is accepted but generation remains pinned to top_k=0 for GPT-OSS behavior.
reasoning_effort can be set directly, or via chat_template_kwargs.reasoning_effort.
Streaming returns chat.completion.chunk events and ends with [DONE].

Responses API Notes

Supported input types are text message items, replayed function_call items, and function_call_output items.
Supported tools are custom function tools only.
Stored responses are process-local, in-memory, and bounded by LRU eviction.
previous_response_id reuses stored conversation transcript, but does not carry forward prior instructions.

Responses API Limits

No multimodal inputs (image, audio, file, etc.)
No hosted OpenAI tools such as web_search, file_search, or code_interpreter
No structured output / non-plain-text text.format
No parallel_tool_calls=false
No named/required tool forcing; tool_choice supports auto and none

Tool Calling Reliability

Uses official Harmony assistant-action stop tokens from openai-harmony (no hardcoded token IDs).
Handles streaming edge cases: unfinished tool-call endings, buffered fallback dedupe, and repeated identical tool calls.
Addresses a class of tool-calling failures seen in other MLX servers.

CLI Options

Flag	Default	Description
`--model`	required	Model path or Hugging Face ID
`--host`	`0.0.0.0`	Bind address
`--port`	`8000`	Bind port
`--context-length`	`8196`	Max KV cache context length
`--log-level`	`INFO`	`DEBUG`, `INFO`, `WARNING`, `ERROR`
`--log-file`	disabled	Optional rotating file log output
`--debug-raw-preview-chars`	`0`	In `DEBUG`, preview N chars of prompts/output
`--http-access-log`	`False`	Emit one access log line per HTTP request
`--responses-store-max-items`	`256`	Max stored `/v1/responses` records kept in memory
`--responses-store-max-bytes`	`67108864`	Approximate max in-memory bytes for stored responses

Security

No built-in auth or API key checks, this is your responsibility.
Default host is 0.0.0.0 for local/LAN self-hosting.
CORS is permissive (*, credentials disabled).
Use --host 127.0.0.1 for local-only access.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

icelaglace

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.3

Mar 14, 2026

1.0.2

Mar 14, 2026

1.0.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_gpt_oss-1.0.3.tar.gz (32.3 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_gpt_oss-1.0.3-py3-none-any.whl (31.7 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file mlx_gpt_oss-1.0.3.tar.gz.

File metadata

Download URL: mlx_gpt_oss-1.0.3.tar.gz
Upload date: Mar 14, 2026
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_gpt_oss-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`64891cc5ffc4bd6c5f202c9afedf39af9c467fa84ab2da96bfdd30b86ab017ff`
MD5	`afad2939944c36bd0e136f02c3653b69`
BLAKE2b-256	`b16ee35b22729d1e55b49a19a8d6511518221ac157e54ab0db3c03181c96a408`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_gpt_oss-1.0.3.tar.gz:

Publisher: publish.yml on icelaglace/mlx-gpt-oss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_gpt_oss-1.0.3.tar.gz
- Subject digest: 64891cc5ffc4bd6c5f202c9afedf39af9c467fa84ab2da96bfdd30b86ab017ff
- Sigstore transparency entry: 1101197470
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: icelaglace/mlx-gpt-oss@21eff22f3579f78ca842978f74b04cd1ba31e4ec
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/icelaglace
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@21eff22f3579f78ca842978f74b04cd1ba31e4ec
- Trigger Event: release

File details

Details for the file mlx_gpt_oss-1.0.3-py3-none-any.whl.

File metadata

Download URL: mlx_gpt_oss-1.0.3-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_gpt_oss-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f669375c7b5383daa7c66891a4c3d6af061763babcaa7a65f738c5a78adb128c`
MD5	`bdd645aea16b9c85c0540310571cf72b`
BLAKE2b-256	`cdfb78fe9c04052ab55b9253c632d040cfd326e9d714fce984990a44d856fbc6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_gpt_oss-1.0.3-py3-none-any.whl:

Publisher: publish.yml on icelaglace/mlx-gpt-oss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_gpt_oss-1.0.3-py3-none-any.whl
- Subject digest: f669375c7b5383daa7c66891a4c3d6af061763babcaa7a65f738c5a78adb128c
- Sigstore transparency entry: 1101197514
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: icelaglace/mlx-gpt-oss@21eff22f3579f78ca842978f74b04cd1ba31e4ec
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/icelaglace
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@21eff22f3579f78ca842978f74b04cd1ba31e4ec
- Trigger Event: release

mlx-gpt-oss 1.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MLX GPT-OSS Server

Feature List

Requirements

Quick Start

Install From Source

API Endpoints

Chat Completions Notes

Responses API Notes

Responses API Limits

Tool Calling Reliability

CLI Options

Security

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance