Whisper Turbo in MLX

These details have not been verified by PyPI

Project links

Homepage

Project description

WTM (Whisper Turbo MLX)

Fast, lightweight Whisper transcription using MLX, in a single file under 300 lines.

Benchmark

Installation

pip install whisper-turbo-mlx

FFmpeg is recommended for faster audio decoding, but optional — the library falls back to librosa automatically if it's not installed.

brew install ffmpeg  # macOS, optional

For CUDA (Linux):

pip install whisper-turbo-mlx[cuda]

For CPU-only (Linux):

pip install whisper-turbo-mlx[cpu]

Usage

CLI

wtm audio.mp3
wtm audio.mp3 --multilingual
wtm audio.mp3 --timestamps
wtm audio.wav --quick

Python

from whisper_turbo import transcribe

txt, segs = transcribe('audio.wav')
txt, segs = transcribe('audio.mp3', multilingual=True, timestamps=True)

Parameters

Parameter	Default	Description
`timestamps`	`False`	Include timestamps
`quick`	`False`	Faster but choppier
`multilingual`	`False`	Multilingual transcription

Example Output

$ wtm test.mp3 -t

[0.00s -> 4.96s]  A coding agent is a language model placed inside a loop, with access to tools that let it interact
[4.96s -> 10.08s]  with a codebase. Instead of just generating text, it can take actions and iterate on them.
[10.72s -> 16.24s]  MLX code packages that into a small Python library, with support for both local inference
[16.24s -> 22.72s]  and external APIs. You start it, give it a task, and it runs a loop. It calls tools,
[22.72s -> 25.84s]  gets results back, and keeps going until it decides it's done.
[26.56s -> 31.92s]  One of those tools is the Agent tool, which lets it spawn a child agent and delegate a task.
[32.48s -> 37.84s]  This exists because of context decay. As sessions get longer, performance drops,
[37.84s -> 44.72s]  history grows, attention spreads, and outputs get worse. Delegating a heavy subtask to a sub-agent
[44.72s -> 50.16s]  keeps both contexts focused. You can customize the agent through command line arguments.
[50.16s]  You can set the system prompt for the agent with "double dash system" or point "double dash skill"
[55.60s -> 62.08s]  at a folder to load skills from. On the back-end side, it can connect to a local model or APIs like
[62.08s -> 68.46s]  Gemini or DeepSeq with DoubleDash API. And if you're running locally, you can also plug in
[68.46s -> 75.52s]  other harnesses like Codex, Gemini CLI, or Claude Code with DoubleDash Leash. You can also sandbox
[75.52s]  your agent however you want. For instance, you can run the harness inside a virtual machine
[80.56s -> 85.38s]  and connect it to an LLM server running on the host or outside APIs.
[86.10s -> 89.78s]  You can use it conversationally, but there's also a set of slash commands.
[90.36s -> 94.90s]  For example, slash branch forks the current conversation into a child agent,
[95.36s -> 99.78s]  runs your prompt there, and returns just the result, leaving the main session clean.
[100.04s -> 104.90s]  So you can ask a side question and get an answer without polluting your working context.
[105.70s -> 111.22s]  When a session starts, the working directory is snapshotted into a fresh Git work tree on a new branch.
[111.92s -> 116.28s]  After every tool round trip, every action and result, it creates a commit.
[116.98s -> 121.58s]  That commit includes both the file changes and the full conversation up to that point.
[122.12s -> 125.56s]  So your Git history becomes a step-by-step trace of the agent's behavior.
[125.96s -> 132.74s]  Each commit captures both the code and the conversation that produced it, so you can restore any point and resume from there.
[133.46s -> 140.46s]  When the agent goes off the rails, which it will, you're not stuck debugging the final state, you have a full timeline of how it got there.
[141.24s -> 145.80s]  While MLX code provides command line interfaces, it's really designed as a library.
[146.50s -> 151.88s]  For example, instead of giving the agent full file system access, you can define a custom toolset.
[152.20s -> 156.04s]  Read KB, Comment KB, and Submit KB.
[156.74s -> 160.24s]  Now the agent is restricted to operating on a structured knowledge base.
[160.84s]  You seed it with documents and start the agent.
[163.00s -> 168.60s]  The main agent reads the material and drafts a synthesis. Then it spawns a sub-agent.
[169.28s -> 173.60s]  That sub-agent acts as a reviewer. It reads the draft and posts critiques.
[174.18s -> 179.00s]  The main agent reads those critiques, revises the draft, and produces a final version.
[179.74s -> 183.02s]  Because everything is modular, you can wire it however you want.
[183.58s -> 186.26s]  An agent triggered by a scheduler instead of a REPL.
[186.70s -> 189.16s]  A tool that takes piped input as a prompt.
[189.36s]  Or multi-agent handoffs, one agent commits a state, another resumes from it. These aren't special modes, they fall out naturally from the components being composable. That's MLX code, composable pieces you can rearrange however you want, and a system you can shape to your own workflows.

Segments: [(0.0, 4.96, 'A coding agent is a language model placed inside a loop, with access to tools that let it interact'), (4.96, 10.08, 'with a codebase. Instead of just generating text, it can take actions and iterate on them.'), (10.72, 16.240000000000002, 'MLX code packages that into a small Python library, with support for both local inference'), (16.240000000000002, 22.72, 'and external APIs. You start it, give it a task, and it runs a loop. It calls tools,'), (22.72, 25.84, "gets results back, and keeps going until it decides it's done."), (26.56, 31.919999999999998, 'One of those tools is the Agent tool, which lets it spawn a child agent and delegate a task.'), (32.48, 37.839999999999996, 'This exists because of context decay. As sessions get longer, performance drops,'), (37.839999999999996, 44.72, 'history grows, attention spreads, and outputs get worse. Delegating a heavy subtask to a sub-agent'), (44.72, 50.16, 'keeps both contexts focused. You can customize the agent through command line arguments.'), (50.16, None, 'You can set the system prompt for the agent with "double dash system" or point "double dash skill"'), (55.6, 62.08, 'at a folder to load skills from. On the back-end side, it can connect to a local model or APIs like'), (62.08, 68.46000000000001, "Gemini or DeepSeq with DoubleDash API. And if you're running locally, you can also plug in"), (68.46000000000001, 75.52000000000001, 'other harnesses like Codex, Gemini CLI, or Claude Code with DoubleDash Leash. You can also sandbox'), (75.52000000000001, None, 'your agent however you want. For instance, you can run the harness inside a virtual machine'), (80.56, 85.38, 'and connect it to an LLM server running on the host or outside APIs.'), (86.10000000000001, 89.78, "You can use it conversationally, but there's also a set of slash commands."), (90.36, 94.9, 'For example, slash branch forks the current conversation into a child agent,'), (95.36, 99.78, 'runs your prompt there, and returns just the result, leaving the main session clean.'), (100.04, 104.9, 'So you can ask a side question and get an answer without polluting your working context.'), (105.7, 111.22, 'When a session starts, the working directory is snapshotted into a fresh Git work tree on a new branch.'), (111.92, 116.28, 'After every tool round trip, every action and result, it creates a commit.'), (116.98, 121.58, 'That commit includes both the file changes and the full conversation up to that point.'), (122.12, 125.56, "So your Git history becomes a step-by-step trace of the agent's behavior."), (125.96000000000001, 132.74, 'Each commit captures both the code and the conversation that produced it, so you can restore any point and resume from there.'), (133.46, 140.46, "When the agent goes off the rails, which it will, you're not stuck debugging the final state, you have a full timeline of how it got there."), (141.24, 145.8, "While MLX code provides command line interfaces, it's really designed as a library."), (146.5, 151.88, 'For example, instead of giving the agent full file system access, you can define a custom toolset.'), (152.20000000000002, 156.04000000000002, 'Read KB, Comment KB, and Submit KB.'), (156.74, 160.24, 'Now the agent is restricted to operating on a structured knowledge base.'), (160.84, None, 'You seed it with documents and start the agent.'), (163.0, 168.6, 'The main agent reads the material and drafts a synthesis. Then it spawns a sub-agent.'), (169.28, 173.6, 'That sub-agent acts as a reviewer. It reads the draft and posts critiques.'), (174.18, 179.0, 'The main agent reads those critiques, revises the draft, and produces a final version.'), (179.74, 183.02, 'Because everything is modular, you can wire it however you want.'), (183.58, 186.26, 'An agent triggered by a scheduler instead of a REPL.'), (186.7, 189.16, 'A tool that takes piped input as a prompt.'), (189.36, None, "Or multi-agent handoffs, one agent commits a state, another resumes from it. These aren't special modes, they fall out naturally from the components being composable. That's MLX code, composable pieces you can rearrange however you want, and a system you can shape to your own workflows.")]

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.3

May 25, 2026

0.0.3a0 pre-release

Oct 20, 2024

0.0.2b0 pre-release

Oct 18, 2024

0.0.2a0 pre-release

Oct 18, 2024

0.0.1

Oct 17, 2024

0.0.1rc11 pre-release

Oct 17, 2024

0.0.1rc10 pre-release

Oct 17, 2024

0.0.1rc9 pre-release

Oct 17, 2024

0.0.1rc8 pre-release

Oct 17, 2024

0.0.1rc7 pre-release

Oct 17, 2024

0.0.1rc6 pre-release

Oct 17, 2024

0.0.1rc5 pre-release

Oct 17, 2024

0.0.1rc4 pre-release

Oct 17, 2024

0.0.1rc3 pre-release

Oct 17, 2024

0.0.1rc2 pre-release

Oct 17, 2024

0.0.1rc1 pre-release

Oct 17, 2024

0.0.1b0 pre-release

Oct 17, 2024

0.0.1a0 pre-release

Oct 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_turbo_mlx-0.0.3.tar.gz (385.2 kB view details)

Uploaded May 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_turbo_mlx-0.0.3-py3-none-any.whl (382.5 kB view details)

Uploaded May 25, 2026 Python 3

File details

Details for the file whisper_turbo_mlx-0.0.3.tar.gz.

File metadata

Download URL: whisper_turbo_mlx-0.0.3.tar.gz
Upload date: May 25, 2026
Size: 385.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for whisper_turbo_mlx-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0530a4385b77d06a5244439a7a2935cd96f76eddc0ebf879bbeab7d6c7adf944`
MD5	`384e912ce21cae0eac6fe49a0862ec47`
BLAKE2b-256	`8140b8b140a549eb1d5a1a34861e4f0951cebf851e4406037aefefd71b1d6000`

See more details on using hashes here.

File details

Details for the file whisper_turbo_mlx-0.0.3-py3-none-any.whl.

File metadata

Download URL: whisper_turbo_mlx-0.0.3-py3-none-any.whl
Upload date: May 25, 2026
Size: 382.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for whisper_turbo_mlx-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4145dd13fbfae45dc5d1a60ecb210677eb018f009fe7b78a1603d6324c74300`
MD5	`259bd8da2affd2fc36c4dc6c129394e7`
BLAKE2b-256	`26d21bb1da10176487b023851b5f454bd4063fafb8f77d16234a97db5d4c4f12`

See more details on using hashes here.

whisper-turbo-mlx 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

WTM (Whisper Turbo MLX)

Installation

Usage

CLI

Python

Parameters

Example Output

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes