Skip to main content

lightweight command-line interface that makes it easy to test VoiceChatEngine

Project description

VoxTerm - Minimalist CLI for Voice Chat

VoxTerm is a lightweight command-line interface that makes it easy to use voice chat APIs (like OpenAI's Realtime API) from your terminal. No fancy UI, no complex frameworks - just simple keyboard controls for voice conversations.

📋 What is VoxTerm?

VoxTerm is a thin CLI wrapper that adds keyboard controls to voice engines. Think of it as the minimal glue between your keyboard and a voice API:

Your Keyboard → VoxTerm → VoiceEngine → AI Voice API

🎯 Philosophy

  • Minimalist: ~500 lines of code total
  • Simple: Just keyboard input and print statements
  • Focused: Does one thing - CLI controls for voice chat
  • Non-blocking: Never interferes with real-time audio
  • Flexible: Works with any voice engine that has the right methods

🚀 Quick Start

from voxterm import VoxTermCLI
from realtimevoiceapi import VoiceEngine

# Create your voice engine
engine = VoiceEngine(api_key="your-key")

# Wrap it with VoxTerm
cli = VoxTermCLI(engine, mode="push_to_talk")

# Run it
import asyncio
asyncio.run(cli.run())

That's it! Now you have:

  • Hold SPACE to talk
  • Press M to mute
  • Press Q to quit

📁 Project Structure

VoxTerm is intentionally tiny:

voxterm/
├── __init__.py      # Package exports
├── cli.py           # Main CLI class (100 lines)
├── modes.py         # Input modes (200 lines)
└── keyboard.py      # Keyboard handling (200 lines)

cli.py - The Main CLI

class VoxTermCLI:
    def __init__(self, voice_engine, mode="push_to_talk"):
        self.engine = voice_engine
        self.mode = mode
        
    async def run(self):
        # Connect engine
        # Setup keyboard
        # Print messages
        # That's all!

modes.py - Input Modes

Simple classes that handle different interaction patterns:

  • PushToTalkMode: Hold key → record → release → send
  • AlwaysOnMode: Continuous listening with VAD
  • TextMode: Type messages instead of speaking
  • TurnBasedMode: Explicit turn-taking

Each mode is just a class with on_key_down() and on_key_up() methods.

keyboard.py - Keyboard Input

Basic keyboard handling that works across platforms:

keyboard = SimpleKeyboard()
keyboard.on_space(on_press_func, on_release_func)
keyboard.on_key('m', mute_func)
keyboard.start()

🎮 Usage Modes

Push-to-Talk (Default)

$ python -m voxterm --mode ptt

🎤 Voice Chat (push_to_talk mode)
Commands: [space] talk, [m] mute, [q] quit

[Hold SPACE to talk...]
🔴 Recording... (2.3s) Sending...
You: How's the weather today?
AI: I don't have access to real-time weather data...

Always-On (VAD)

$ python -m voxterm --mode always_on

🎤 Always listening (VAD active)
[Just speak naturally, AI will respond when you pause]

Text Mode

$ python -m voxterm --mode text

💬 Type your messages:
You: Hello!
AI: Hi there! How can I help you today?

🔧 Integration

VoxTerm expects a voice engine with these methods:

# Required methods
async engine.connect()
async engine.disconnect()
async engine.start_listening()
async engine.stop_listening()
async engine.send_text(text: str)

# Required callbacks
engine.on_text_response = func(text: str)
engine.on_user_transcript = func(text: str)

Works out of the box with:

  • realtimevoiceapi.VoiceEngine
  • Any engine with a similar interface

🎨 Customization

Custom Modes

class MyCustomMode:
    def __init__(self, engine):
        self.engine = engine
        
    async def on_key_down(self, key: str):
        if key == "r":  # Custom recording key
            await self.engine.start_listening()

Custom Key Bindings

cli = VoxTermCLI(engine)
cli.keyboard.on_key('t', lambda: print("Custom action!"))

🚫 What VoxTerm Doesn't Do

  • ❌ No UI rendering or colors
  • ❌ No audio processing
  • ❌ No network/WebSocket handling
  • ❌ No state management
  • ❌ No configuration files
  • ❌ No fancy terminal graphics

VoxTerm just connects your keyboard to a voice engine. The voice engine handles everything else.

📊 Why So Simple?

Real-world usage showed that for CLI voice chat, you need:

  1. A way to trigger recording (keyboard)
  2. A way to see what was said (print)
  3. Different modes for different use cases

That's exactly what VoxTerm provides - nothing more, nothing less.

🏃 Example: Complete Voice Chat in 10 Lines

import asyncio
from voxterm import VoxTermCLI
from realtimevoiceapi import VoiceEngine

async def main():
    engine = VoiceEngine(api_key="your-key")
    cli = VoxTermCLI(engine, mode="push_to_talk")
    await cli.run()

if __name__ == "__main__":
    asyncio.run(main())

📝 License

MIT - Use it however you want!


Remember: VoxTerm is just the keyboard controls. Your voice engine does the actual work. We just make it easy to use from the command line! 🎤

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxterm-0.0.0.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxterm-0.0.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file voxterm-0.0.0.tar.gz.

File metadata

  • Download URL: voxterm-0.0.0.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for voxterm-0.0.0.tar.gz
Algorithm Hash digest
SHA256 64e018f15e93583802dc69d90abdd91100cab94515e015966aca1dbb14324fc1
MD5 c216686f6b72d92130a45adbbac1ceb4
BLAKE2b-256 a455d82b7641419a0d5f2f1b3b8e6ba3bd9814b7039a63962bd6bf7578598533

See more details on using hashes here.

File details

Details for the file voxterm-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: voxterm-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for voxterm-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3d3010a6379f6047887040bcd5b846f5000bb2afd162bef2aaa7d49cd6ad521
MD5 9fff5d90b0af814cc97c1dd1004c750c
BLAKE2b-256 4cc786d4e6bd47a73a45c30b08ebf034155834e719aef8189263bc0e5c722a83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page