Async Python client for OpenAI Realtime transcription with microphone input, streamed transcript deltas, speech boundary events, and WebSocket lifecycle management.
Project description
realtime-whisper
Small async Python client for OpenAI Realtime transcription with microphone input, streamed transcript deltas, completed transcript events, speech boundary events, and manual buffer flush support.
Requirements
- Python 3.14 or newer
- An OpenAI API key, or Azure OpenAI Realtime credentials
- A working audio input device when using the default microphone input
Installation
With uv (recommended):
uv add realtime-whisper
# Include microphone support
uv add "realtime-whisper[audio]"
With pip:
pip install realtime-whisper
# Include microphone support
pip install "realtime-whisper[audio]"
From source (this repository):
uv sync --extra audio
# or
pip install -e ".[audio]"
The audio extra installs sounddevice, which is required by the default
MicrophoneInput. If you provide your own audio input implementation, the base
dependencies are enough.
Set your OpenAI API key before running the examples:
export OPENAI_API_KEY="your-api-key"
On PowerShell:
$env:OPENAI_API_KEY = "your-api-key"
Quick Start
import asyncio
from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta
async def main() -> None:
transcriber = RealtimeTranscriber(language="en")
async for event in transcriber.stream():
match event:
case TranscriptDelta(delta=delta):
print(delta, end="", flush=True)
case TranscriptCompleted(transcript=transcript):
print(f"\n>>> {transcript}\n")
asyncio.run(main())
Run the included examples:
# Continuous transcription
uv run python -m examples.transcribe_console
# Push-to-talk (press Enter to flush the buffer)
uv run python -m examples.transcribe_push_to_talk
API Overview
RealtimeTranscriber
Basic streaming — reads from your default microphone and prints every delta and completed transcript segment:
import asyncio
from realtime_whisper import (
NoiseReduction,
RealtimeTranscriber,
TranscriptionDelay,
TranscriptCompleted,
TranscriptDelta,
)
async def main() -> None:
transcriber = RealtimeTranscriber(
language="en", # BCP-47 tag, or None for auto-detect
delay=TranscriptionDelay.MEDIUM, # latency vs. completeness trade-off
noise_reduction=NoiseReduction.FAR_FIELD, # FAR_FIELD or NEAR_FIELD
include_logprobs=False, # True → per-token log-probabilities
)
async for event in transcriber.stream():
match event:
case TranscriptDelta(delta=delta):
print(delta, end="", flush=True)
case TranscriptCompleted(transcript=transcript):
print(f"\n>>> {transcript}\n")
asyncio.run(main())
See
examples/transcribe_console.pyfor the full runnable version of this pattern.
Push-to-talk — call flush() to commit the audio buffer and trigger
transcription on demand (e.g. when the user releases a key):
import asyncio
from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta
async def read_enter_loop(transcriber: RealtimeTranscriber) -> None:
loop = asyncio.get_running_loop()
while True:
await loop.run_in_executor(None, input) # blocks until Enter is pressed
await transcriber.flush()
async def main() -> None:
transcriber = RealtimeTranscriber(language="en")
asyncio.create_task(read_enter_loop(transcriber))
async for event in transcriber.stream():
match event:
case TranscriptDelta(delta=delta):
print(delta, end="", flush=True)
case TranscriptCompleted(transcript=transcript):
print(f"\n>>> {transcript}\n")
asyncio.run(main())
See
examples/transcribe_push_to_talk.pyfor the full runnable version of this pattern.
As an async context manager — stop() is called automatically on exit:
async with RealtimeTranscriber(language="en") as transcriber:
async for event in transcriber.stream():
...
### Events
The public event types are exported from `realtime_whisper`:
- `SessionConnected`
- `TranscriptDelta`
- `TranscriptCompleted`
- `SpeechStarted`
- `SpeechStopped`
- `TranscriberError`
### Options
Use `TranscriptionDelay` to control latency versus completeness:
- `TranscriptionDelay.MINIMAL`
- `TranscriptionDelay.LOW`
- `TranscriptionDelay.MEDIUM`
- `TranscriptionDelay.HIGH`
- `TranscriptionDelay.XHIGH`
Use `NoiseReduction` for input noise reduction:
- `NoiseReduction.NEAR_FIELD`
- `NoiseReduction.FAR_FIELD`
Example:
```python
from realtime_whisper import NoiseReduction, RealtimeTranscriber, TranscriptionDelay
transcriber = RealtimeTranscriber(
language="de",
delay=TranscriptionDelay.LOW,
noise_reduction=NoiseReduction.NEAR_FIELD,
)
Providers
By default, RealtimeTranscriber uses OpenAIProvider and reads
OPENAI_API_KEY from the environment. You can also pass api_key directly:
transcriber = RealtimeTranscriber(api_key="your-api-key")
For Azure OpenAI, pass an AzureOpenAIProvider:
from realtime_whisper import AzureOpenAIProvider, RealtimeTranscriber
provider = AzureOpenAIProvider(
resource="my-resource",
deployment="my-realtime-deployment",
api_key="my-api-key",
)
transcriber = RealtimeTranscriber(provider=provider)
The Azure provider can also read these environment variables:
AZURE_OPENAI_RESOURCEAZURE_OPENAI_DEPLOYMENTAZURE_OPENAI_API_KEY
Custom Audio Input
Pass an object implementing AudioInputDevice to use a custom audio source.
Audio chunks must be raw 24 kHz mono PCM bytes unless you also change the session
settings in the package internals.
from collections.abc import AsyncIterator
from realtime_whisper.audio import AudioInputDevice
class MyAudioInput(AudioInputDevice):
async def start(self) -> None:
...
async def stop(self) -> None:
...
async def stream_chunks(self) -> AsyncIterator[bytes]:
...
@property
def is_active(self) -> bool:
...
Development
uv sync --extra audio --group dev
uv run ruff check .
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file realtime_whisper-0.1.0.tar.gz.
File metadata
- Download URL: realtime_whisper-0.1.0.tar.gz
- Upload date:
- Size: 52.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
412f793611c672e4eed73e600fa25fa0320590c5c0597235e354f1f945fb59b6
|
|
| MD5 |
91da3df13e1b34d9a3c1b2cbd651ddfd
|
|
| BLAKE2b-256 |
a2064be9454e88915ed3475dc0d04d59d8488ec4710fd64ab56d520da5f3d778
|
File details
Details for the file realtime_whisper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: realtime_whisper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
829fb3c3536cb10635e3f6f32f68dedcaf2a60701fc35a65dcff5c723e87f306
|
|
| MD5 |
8abc910c397cee84fb001707fc24aa65
|
|
| BLAKE2b-256 |
05ea1db573ec032e8c6acc1ffa5526068dc8ed80203686ccc69d4bc5b3109c56
|