voice-budget is a toolkit for building and managing voice agents with a focus on context, compression, and real-time performance.
Project description
voice-budget
TTFT feedback loop for voice agent context management.
Other libraries compress blindly. voice-budget measures TTFT before and after, auto-tunes, and rolls back if compression hurts.
import asyncio
from voice_budget import wrap
async def main():
managed = wrap(your_llm, target_ms=800)
response = await managed(messages) # measures, compresses, verifies
asyncio.run(main())
Install
pip install voice-budget
# With semantic compression (recommended):
pip install "voice-budget[semantic]"
Dependencies: numpy, tiktoken only. No GPU. No cloud API.
Quick start
Framework-agnostic
import asyncio
from voice_budget import wrap
async def my_llm(messages, **kwargs):
resp = await openai_client.chat.completions.create(
model="gpt-4o", messages=messages, **kwargs
)
return resp.choices[0].message.content
async def voice_loop():
managed = wrap(my_llm, target_ms=800, verbose=True)
messages = [{"role": "system", "content": "You are a voice assistant."}]
while True:
messages.append({"role": "user", "content": await get_user_speech()})
response = await managed(messages)
messages.append({"role": "assistant", "content": response})
asyncio.run(voice_loop())
Pipecat
Note for Pipecat Users: The provided
VoiceBudgetProcessorinpipecat_integration.pyis a blueprint. In order to properly integrate it with a full Pipecat pipeline, you will need to ensure it correctly inherits frompipecat.processors.frame_processor.FrameProcessorand wires up thepush_frameandprocess_framemethods to pass frames down the pipeline.
from pipecat.pipeline.pipeline import Pipeline
from voice_budget.pipecat_integration import VoiceBudgetProcessor
budget = VoiceBudgetProcessor(target_ms=800, verbose=True)
pipeline = Pipeline([
transport.input(), stt, context_aggregator.user(),
budget, # ← insert before LLM
llm, tts, transport.output(), context_aggregator.assistant(),
])
How it works
Turn 1: TTFT=480ms tokens=120 ✓ under budget
Turn 8: TTFT=920ms tokens=980 ↑ P95 > 800ms → sliding_window → 980→420 tokens
Turn 9: TTFT=490ms tokens=420 ✓ compression helped (delta=430ms)
Turn 14: TTFT=850ms tokens=720 ↑ P95 > 800ms → semantic_trim → 720→350 tokens
Turn 15: TTFT=460ms tokens=350 ✓ compression helped
Compression strategies (escalating cost)
| Strategy | Cost | When used |
|---|---|---|
sliding_window |
Free | First attempt — drop oldest turns |
semantic_trim |
~5ms (local embeddings) | If sliding window not enough |
summarise_tail |
1 LLM call | If semantic trim not enough (opt-in) |
Configuration
from voice_budget import VoiceBudget
budget = VoiceBudget(
llm_fn=your_llm,
target_ms=800, # TTFT budget in ms (P95)
model="gpt-4o", # for tiktoken token counting
window_size=20, # rolling window for statistics
token_budget=2000, # target token count after compression
use_semantic=True, # semantic trim (needs sentence-transformers)
use_summarise=False, # LLM-based summarisation (costs 1 LLM call)
verbose=True, # print compression decisions
on_compression=callback, # called after each compression event
on_budget_violation=cb, # called when P95 > target_ms
)
Stats and reporting
s = managed.stats()
print(s.p50_ms, s.p95_ms, s.jitter_ms)
managed.print_report()
============================================================
voice-budget Report
============================================================
Total turns: 47
Current P50 TTFT: 510ms
Current P95 TTFT: 780ms
Target: 800ms
Budget met: ✓
Compressions: 3
Helpful: 3
Harmful (rolled back):0
Total tokens saved: 1,840
Strategies used: sliding_window, semantic_trim
============================================================
Why not use existing tools?
| Tool | TTFT-aware? | Feedback loop? | Auto-tune? |
|---|---|---|---|
| context-compressor | ✗ | ✗ | ✗ |
| reme-ai | ✗ | ✗ | ✗ |
| Pipecat compaction | ✗ | ✗ | ✗ |
| LangChain SummaryMemory | ✗ | ✗ | ✗ |
| voice-budget | ✓ | ✓ | ✓ |
Contributing
Issues and PRs welcome. See CONTRIBUTING.md.
License
MIT
Releases
When you publish a new release make sure to follow these steps so CI can build and publish to PyPI automatically:
-
Bump the version in two places:
pyproject.toml(theversionfield)voice_budget/__init__.py(the__version__string)
-
Run the test and lint suite locally:
# Run unit tests
pytest tests/ -v
# Optional: run ruff if installed
ruff check voice_budget/
- Commit the version bump and push to the remote repository:
git add pyproject.toml voice_budget/__init__.py
git commit -m "chore(release): bump version x.y.z"
git push origin HEAD
- Create a git tag and push it (GitHub Actions will publish on tags that start with
v):
# Create an annotated tag
git tag -a vX.Y.Z -m "Release vX.Y.Z"
# Push the tag
git push origin vX.Y.Z
- CI (GitHub Actions) will run tests/lint and, on tag pushes, build and publish to PyPI using the
PYPI_API_TOKENsecret. Make sure the repository has this secret configured in Settings → Secrets → Actions asPYPI_API_TOKENbefore pushing tags.
Notes:
- Use semantic versioning (MAJOR.MINOR.PATCH) for tags (for example
v0.2.1). - If a tag already exists and you truly need to move it, coordinate with maintainers: force-updating tags that are already published to PyPI is discouraged.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_budget-0.2.3.tar.gz.
File metadata
- Download URL: voice_budget-0.2.3.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ef4050183049d58690d7d397e87347ab4350f4608effbe53a678bb4c54fe0b7
|
|
| MD5 |
528b5b2128dac353bf24c77f41ba760d
|
|
| BLAKE2b-256 |
459f72f6c6058be5c9ef3b8454e4c20454747a19bcfbaf844d2a882ada56d60c
|
File details
Details for the file voice_budget-0.2.3-py3-none-any.whl.
File metadata
- Download URL: voice_budget-0.2.3-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3617f7e83e61af237dd37d106e7867bda117a1db469faae83f2c63a9d9666bb5
|
|
| MD5 |
043519b635ab07588bef11cb2e30a374
|
|
| BLAKE2b-256 |
f9e6eab86e813d95c243c3d4f2d7edceaf008ccb489246b09287a105fde6adff
|