Realtime AI Voice Interface using OpenAI Realtime API

Project description

SpeakNow

SpeakNow is a high-performance, real-time AI voice interface built that runs on either OpenAI Realtime API, Google Gemini Flash Native Audio Live API or Grok Voice. It provides a seamless, low-latency speech-to-speech conversational experience directly in your terminal.

In additional to the textual TUI, various parts of the code can be used in other applications as a library particularly the code in speaknow_ai_realtime_text_to_speech.py/ai_services path. Documentation for a stable API is planned in near future. This project is based on and inspired by the push_to_talk_app.py example from the openai-python repository but with lots of features added and support for other AI services.

Features

Low-Latency Speech-to-Speech: Direct multimodal interaction using the gpt-realtime or gpt-realtime-mini or gemini-2.5-flash-native-audio models for near-instant responses.
Real-time Transcription: View live streaming transcripts of your conversation as you speak.
Advanced Audio Handling: Save input speech to local WAV files for record-keeping or debugging.
Configurable Parameters: Easily adjust system prompts, model names, mode, and transcription options through a built-in TUI settings menu.
Professional TUI: A clean, "sticky" interface with persistent headers, footers, and scrollable settings panes using textual.
Voice Amplitude Monitor: Monitor the volume of the voice input

Installation

SpeakNow requires Python 3.11 or greater.

To install the latest version from PyPI, run:

pip install speaknow-gui

On Windows ffmpeg is required:

winget install --id=Gyan.FFmpeg

For linux, portaudio19-dev and ffmpeg are required. For example, to install on Ubuntu:

sudo apt install portaudio19-dev ffmpeg

Usage

Configuration

Before running, ensure your OPENAI_API_KEY is set in your environment variables as minimum. GEMINI_API_KEY and XAI_API_KEY can also be set if those services are needed.

In Windows open the Edit Environmnent Variables GUI and add it there.

In Linux:

export OPENAI_API_KEY="your-api-key-here"

SpeakNow provides two main entry points for different use cases. If the script doesn't work, make sure the scripts path for Python is included in the PATH Environment Variable.

1. Standard Application

Launch the main TUI application to start a real-time session:

Windows:

speaknow.exe

Linux:

speaknow

GUI preview Config preview

2. Web Service Mode

Run a server-side version optimized for shared or remote environments:

Windows:

speaknow-serve.exe

Linux:

speaknow-serve

Modes:

The mode can be changed in configuration. Manual mode is triggered by hitting "Start," speaking and then hitting "Stop." to send the audio. Server VAD and uses periods of silence to automatically chunk the audio. Semantic VAD uses a semantic classifier to detect when the user has finished speaking, based on the words they have uttered. This setting will be ignored in Gemini which uses it's default VAD mechanism only.

Application Data

Logs, token usage and config file (can also be modifed in the TUI) will be stored here:

Windows: %APPDATA%\Speaknow
Windows when python is installed from Microsoft Store path will be something like: %LOCALAPPDATA%\Packages\PythonSoftwareFoundation.Python.3.1<....>\LocalCache\Roaming\Speaknow
Linux: $HOME/.config/Speaknow

Project details

Release history Release notifications | RSS feed

This version

0.2.1

Jan 7, 2026

0.2.0

Jan 7, 2026

0.1.4

Jan 4, 2026

0.1.3

Jan 4, 2026

0.1.2

Jan 3, 2026

0.1.1

Jan 3, 2026

0.1.0

Jan 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speaknow_gui-0.2.1.tar.gz (27.0 kB view details)

Uploaded Jan 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speaknow_gui-0.2.1-py3-none-any.whl (31.3 kB view details)

Uploaded Jan 7, 2026 Python 3

File details

Details for the file speaknow_gui-0.2.1.tar.gz.

File metadata

Download URL: speaknow_gui-0.2.1.tar.gz
Upload date: Jan 7, 2026
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for speaknow_gui-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`5782b9bc03b9177f38473088cd6c5bb0de94f6019b5779d319053cff7d1a2f11`
MD5	`ba4383f9c2c13cd61c0cb39bc93d8b13`
BLAKE2b-256	`12b02b08d1cf9aa4582e9f32fb56ba750987b19d49bea9e96596b45cec9ec731`

See more details on using hashes here.

File details

Details for the file speaknow_gui-0.2.1-py3-none-any.whl.

File metadata

Download URL: speaknow_gui-0.2.1-py3-none-any.whl
Upload date: Jan 7, 2026
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for speaknow_gui-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64b9c68ba614ae30f24f02b3175e22f784a8b354a233464f640ea416184dfc32`
MD5	`324a07a12ba327cfde2ccd5bb3073031`
BLAKE2b-256	`5a7777510b6bb08537edcb6cfb8e7f4d2ccc5455863b25c510cc4b762dc35543`

See more details on using hashes here.

speaknow-gui 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

SpeakNow

Features

Installation

Usage

Configuration

1. Standard Application

2. Web Service Mode

Modes:

Application Data

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes