Skip to main content

voice-budget is a toolkit for building and managing voice agents with a focus on context, compression, and real-time performance.

Project description

voice-budget

TTFT feedback loop for voice agent context management.

Other libraries compress blindly. voice-budget measures TTFT before and after, auto-tunes, and rolls back if compression hurts.

import asyncio
from voice_budget import wrap

async def main():
    managed = wrap(your_llm, target_ms=800)
    response = await managed(messages)  # measures, compresses, verifies

asyncio.run(main())

Install

pip install voice-budget

# With semantic compression (recommended):
pip install "voice-budget[semantic]"

Dependencies: numpy, tiktoken only. No GPU. No cloud API.


Quick start

Framework-agnostic

import asyncio
from voice_budget import wrap

async def my_llm(messages, **kwargs):
    resp = await openai_client.chat.completions.create(
        model="gpt-4o", messages=messages, **kwargs
    )
    return resp.choices[0].message.content

async def voice_loop():
    managed = wrap(my_llm, target_ms=800, verbose=True)
    messages = [{"role": "system", "content": "You are a voice assistant."}]
    while True:
        messages.append({"role": "user", "content": await get_user_speech()})
        response = await managed(messages)
        messages.append({"role": "assistant", "content": response})

asyncio.run(voice_loop())

Pipecat

Note for Pipecat Users: The provided VoiceBudgetProcessor in pipecat_integration.py is a blueprint. In order to properly integrate it with a full Pipecat pipeline, you will need to ensure it correctly inherits from pipecat.processors.frame_processor.FrameProcessor and wires up the push_frame and process_frame methods to pass frames down the pipeline.

from pipecat.pipeline.pipeline import Pipeline
from voice_budget.pipecat_integration import VoiceBudgetProcessor

budget = VoiceBudgetProcessor(target_ms=800, verbose=True)

pipeline = Pipeline([
    transport.input(), stt, context_aggregator.user(),
    budget,          # ← insert before LLM
    llm, tts, transport.output(), context_aggregator.assistant(),
])

How it works

Turn 1:   TTFT=480ms  tokens=120  ✓ under budget
Turn 8:   TTFT=920ms  tokens=980  ↑ P95 > 800ms → sliding_window → 980→420 tokens
Turn 9:   TTFT=490ms  tokens=420  ✓ compression helped (delta=430ms)
Turn 14:  TTFT=850ms  tokens=720  ↑ P95 > 800ms → semantic_trim → 720→350 tokens
Turn 15:  TTFT=460ms  tokens=350  ✓ compression helped

Compression strategies (escalating cost)

Strategy Cost When used
sliding_window Free First attempt — drop oldest turns
semantic_trim ~5ms (local embeddings) If sliding window not enough
summarise_tail 1 LLM call If semantic trim not enough (opt-in)

Configuration

from voice_budget import VoiceBudget

budget = VoiceBudget(
    llm_fn=your_llm,
    target_ms=800,           # TTFT budget in ms (P95)
    model="gpt-4o",          # for tiktoken token counting
    window_size=20,          # rolling window for statistics
    token_budget=2000,       # target token count after compression
    use_semantic=True,       # semantic trim (needs sentence-transformers)
    use_summarise=False,     # LLM-based summarisation (costs 1 LLM call)
    verbose=True,            # print compression decisions
    on_compression=callback, # called after each compression event
    on_budget_violation=cb,  # called when P95 > target_ms
)

Stats and reporting

s = managed.stats()
print(s.p50_ms, s.p95_ms, s.jitter_ms)

managed.print_report()
============================================================
voice-budget Report
============================================================
  Total turns:          47
  Current P50 TTFT:     510ms
  Current P95 TTFT:     780ms
  Target:               800ms
  Budget met:           ✓
  Compressions:         3
  Helpful:              3
  Harmful (rolled back):0
  Total tokens saved:   1,840
  Strategies used:      sliding_window, semantic_trim
============================================================

Why not use existing tools?

Tool TTFT-aware? Feedback loop? Auto-tune?
context-compressor
reme-ai
Pipecat compaction
LangChain SummaryMemory
voice-budget

Contributing

Issues and PRs welcome. See CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_budget-0.2.1.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_budget-0.2.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file voice_budget-0.2.1.tar.gz.

File metadata

  • Download URL: voice_budget-0.2.1.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_budget-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0dcdcf88ca1231d6306a1f6ce8376ff2fd11353c24028b7d6a722449fcaa9e22
MD5 36d01f7c4db11550372dfc78dd02efe9
BLAKE2b-256 ede2301d12c4888f2666f89e7233392ee598f62ffa7518d944e095a6638e34e3

See more details on using hashes here.

File details

Details for the file voice_budget-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: voice_budget-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_budget-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa154da1dd1e30cec1024dd7e092cf4bf4c3244bf55472e8da84f06f2e648b2a
MD5 807c2bc13539b8116dfe2a4d651c554a
BLAKE2b-256 318f3787abcaa9adadc91903167108eb803cc488b3a8ee6643390def8a7174e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page