Skip to main content

Async Python client for OpenAI Realtime transcription with microphone input, streamed transcript deltas, speech boundary events, and WebSocket lifecycle management.

Project description

realtime-whisper

Small async Python client for OpenAI Realtime transcription with microphone input, streamed transcript deltas, completed transcript events, speech boundary events, and manual buffer flush support.

Requirements

  • Python 3.14 or newer
  • An OpenAI API key, or Azure OpenAI Realtime credentials
  • A working audio input device when using the default microphone input

Installation

With uv (recommended):

uv add realtime-whisper

# Include microphone support
uv add "realtime-whisper[audio]"

With pip:

pip install realtime-whisper

# Include microphone support
pip install "realtime-whisper[audio]"

From source (this repository):

uv sync --extra audio
# or
pip install -e ".[audio]"

The audio extra installs sounddevice, which is required by the default MicrophoneInput. If you provide your own audio input implementation, the base dependencies are enough.

Set your OpenAI API key before running the examples:

export OPENAI_API_KEY="your-api-key"

On PowerShell:

$env:OPENAI_API_KEY = "your-api-key"

Quick Start

import asyncio

from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta


async def main() -> None:
	transcriber = RealtimeTranscriber(language="en")

	async for event in transcriber.stream():
		match event:
			case TranscriptDelta(delta=delta):
				print(delta, end="", flush=True)
			case TranscriptCompleted(transcript=transcript):
				print(f"\n>>> {transcript}\n")


asyncio.run(main())

Run the included examples:

# Continuous transcription
uv run python -m examples.transcribe_console

# Push-to-talk (press Enter to flush the buffer)
uv run python -m examples.transcribe_push_to_talk

API Overview

RealtimeTranscriber

Basic streaming — reads from your default microphone and prints every delta and completed transcript segment:

import asyncio

from realtime_whisper import (
    NoiseReduction,
    RealtimeTranscriber,
    TranscriptionDelay,
    TranscriptCompleted,
    TranscriptDelta,
)


async def main() -> None:
    transcriber = RealtimeTranscriber(
        language="en",                             # BCP-47 tag, or None for auto-detect
        delay=TranscriptionDelay.MEDIUM,           # latency vs. completeness trade-off
        noise_reduction=NoiseReduction.FAR_FIELD,  # FAR_FIELD or NEAR_FIELD
        include_logprobs=False,                    # True → per-token log-probabilities
    )

    async for event in transcriber.stream():
        match event:
            case TranscriptDelta(delta=delta):
                print(delta, end="", flush=True)
            case TranscriptCompleted(transcript=transcript):
                print(f"\n>>> {transcript}\n")


asyncio.run(main())

See examples/transcribe_console.py for the full runnable version of this pattern.

Push-to-talk — call flush() to commit the audio buffer and trigger transcription on demand (e.g. when the user releases a key):

import asyncio

from realtime_whisper import RealtimeTranscriber, TranscriptCompleted, TranscriptDelta


async def read_enter_loop(transcriber: RealtimeTranscriber) -> None:
    loop = asyncio.get_running_loop()
    while True:
        await loop.run_in_executor(None, input)  # blocks until Enter is pressed
        await transcriber.flush()


async def main() -> None:
    transcriber = RealtimeTranscriber(language="en")
    asyncio.create_task(read_enter_loop(transcriber))

    async for event in transcriber.stream():
        match event:
            case TranscriptDelta(delta=delta):
                print(delta, end="", flush=True)
            case TranscriptCompleted(transcript=transcript):
                print(f"\n>>> {transcript}\n")


asyncio.run(main())

See examples/transcribe_push_to_talk.py for the full runnable version of this pattern.

As an async context managerstop() is called automatically on exit:

async with RealtimeTranscriber(language="en") as transcriber:
    async for event in transcriber.stream():
        ...

### Events

The public event types are exported from `realtime_whisper`:

- `SessionConnected`
- `TranscriptDelta`
- `TranscriptCompleted`
- `SpeechStarted`
- `SpeechStopped`
- `TranscriberError`

### Options

Use `TranscriptionDelay` to control latency versus completeness:

- `TranscriptionDelay.MINIMAL`
- `TranscriptionDelay.LOW`
- `TranscriptionDelay.MEDIUM`
- `TranscriptionDelay.HIGH`
- `TranscriptionDelay.XHIGH`

Use `NoiseReduction` for input noise reduction:

- `NoiseReduction.NEAR_FIELD`
- `NoiseReduction.FAR_FIELD`

Example:

```python
from realtime_whisper import NoiseReduction, RealtimeTranscriber, TranscriptionDelay

transcriber = RealtimeTranscriber(
	language="de",
	delay=TranscriptionDelay.LOW,
	noise_reduction=NoiseReduction.NEAR_FIELD,
)

Providers

By default, RealtimeTranscriber uses OpenAIProvider and reads OPENAI_API_KEY from the environment. You can also pass api_key directly:

transcriber = RealtimeTranscriber(api_key="your-api-key")

For Azure OpenAI, pass an AzureOpenAIProvider:

from realtime_whisper import AzureOpenAIProvider, RealtimeTranscriber

provider = AzureOpenAIProvider(
	resource="my-resource",
	deployment="my-realtime-deployment",
	api_key="my-api-key",
)

transcriber = RealtimeTranscriber(provider=provider)

The Azure provider can also read these environment variables:

  • AZURE_OPENAI_RESOURCE
  • AZURE_OPENAI_DEPLOYMENT
  • AZURE_OPENAI_API_KEY

Custom Audio Input

Pass an object implementing AudioInputDevice to use a custom audio source. Audio chunks must be raw 24 kHz mono PCM bytes unless you also change the session settings in the package internals.

from collections.abc import AsyncIterator

from realtime_whisper.audio import AudioInputDevice


class MyAudioInput(AudioInputDevice):
	async def start(self) -> None:
		...

	async def stop(self) -> None:
		...

	async def stream_chunks(self) -> AsyncIterator[bytes]:
		...

	@property
	def is_active(self) -> bool:
		...

Development

uv sync --extra audio --group dev
uv run ruff check .
uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

realtime_whisper-0.1.0.tar.gz (52.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

realtime_whisper-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file realtime_whisper-0.1.0.tar.gz.

File metadata

  • Download URL: realtime_whisper-0.1.0.tar.gz
  • Upload date:
  • Size: 52.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for realtime_whisper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 412f793611c672e4eed73e600fa25fa0320590c5c0597235e354f1f945fb59b6
MD5 91da3df13e1b34d9a3c1b2cbd651ddfd
BLAKE2b-256 a2064be9454e88915ed3475dc0d04d59d8488ec4710fd64ab56d520da5f3d778

See more details on using hashes here.

File details

Details for the file realtime_whisper-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for realtime_whisper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 829fb3c3536cb10635e3f6f32f68dedcaf2a60701fc35a65dcff5c723e87f306
MD5 8abc910c397cee84fb001707fc24aa65
BLAKE2b-256 05ea1db573ec032e8c6acc1ffa5526068dc8ed80203686ccc69d4bc5b3109c56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page