voice-budget is a toolkit for building and managing voice agents with a focus on context, compression, and real-time performance.

These details have not been verified by PyPI

Project links

Project description

voice-budget

TTFT feedback loop for voice agent context management.

Other libraries compress blindly. voice-budget measures TTFT before and after, auto-tunes, and rolls back if compression hurts.

import asyncio
from voice_budget import wrap

async def main():
    managed = wrap(your_llm, target_ms=800)
    response = await managed(messages)  # measures, compresses, verifies

asyncio.run(main())

Install

pip install voice-budget

# With semantic compression (recommended):
pip install "voice-budget[semantic]"

Dependencies: numpy, tiktoken only. No GPU. No cloud API.

Integrations

Use voice-budget with any framework:

Framework-agnostic

import asyncio
from voice_budget import wrap

async def my_llm(messages, **kwargs):
    resp = await openai_client.chat.completions.create(
        model="gpt-4o", messages=messages, **kwargs
    )
    return resp.choices[0].message.content

async def voice_loop():
    managed = wrap(my_llm, target_ms=800, verbose=True)
    messages = [{"role": "system", "content": "You are a voice assistant."}]
    while True:
        messages.append({"role": "user", "content": await get_user_speech()})
        response = await managed(messages)
        messages.append({"role": "assistant", "content": response})

asyncio.run(voice_loop())

Pipecat

Note for Pipecat Users: The provided VoiceBudgetProcessor in pipecat_integration.py is a blueprint. In order to properly integrate it with a full Pipecat pipeline, you will need to ensure it correctly inherits from pipecat.processors.frame_processor.FrameProcessor and wires up the push_frame and process_frame methods to pass frames down the pipeline.

from pipecat.pipeline.pipeline import Pipeline
from voice_budget.pipecat_integration import VoiceBudgetProcessor

budget = VoiceBudgetProcessor(target_ms=800, verbose=True)

pipeline = Pipeline([
    transport.input(), stt, context_aggregator.user(),
    budget,          # ← insert before LLM
    llm, tts, transport.output(), context_aggregator.assistant(),
])

LiveKit

Use VoiceBudgetAgent to wrap your LiveKit agent's LLM calls:

from voice_budget import VoiceBudgetAgent

budget = VoiceBudgetAgent(
    target_ms=800,
    token_budget=2000,
    model="gpt-4o",
    use_semantic=True,
    verbose=True,
)

async def on_message(message: str, messages: list):
    # Compress context and measure TTFT
    response = await budget.process_messages(
        messages=messages,
        llm_fn=your_llm_function,
    )

    # Streaming LLMs return an async iterator; non-streaming calls return text.
    if hasattr(response, "__aiter__"):
        chunks = []
        async for chunk in response:
            chunks.append(chunk)
        response_text = "".join(chunks)
    else:
        response_text = response

    messages.append({"role": "assistant", "content": response_text})
    return response_text

# Access stats and reports
stats = budget.stats()
report = budget.report()

How it works

Turn 1:   TTFT=480ms  tokens=120  ✓ under budget
Turn 8:   TTFT=920ms  tokens=980  ↑ P95 > 800ms → sliding_window → 980→420 tokens
Turn 9:   TTFT=490ms  tokens=420  ✓ compression helped (delta=430ms)
Turn 14:  TTFT=850ms  tokens=720  ↑ P95 > 800ms → semantic_trim → 720→350 tokens
Turn 15:  TTFT=460ms  tokens=350  ✓ compression helped

Compression strategies (escalating cost)

Strategy	Cost	When used
`sliding_window`	Free	First attempt — drop oldest turns
`semantic_trim`	~5ms (local embeddings)	If sliding window not enough
`summarise_tail`	1 LLM call	If semantic trim not enough (opt-in)

Configuration

from voice_budget import VoiceBudget

budget = VoiceBudget(
    llm_fn=your_llm,
    target_ms=800,           # TTFT budget in ms (P95)
    model="gpt-4o",          # for tiktoken token counting
    window_size=20,          # rolling window for statistics
    token_budget=2000,       # target token count after compression
    use_semantic=True,       # semantic trim (needs sentence-transformers)
    use_summarise=False,     # LLM-based summarisation (costs 1 LLM call)
    verbose=True,            # print compression decisions
    on_compression=callback, # called after each compression event
    on_budget_violation=cb,  # called when P95 > target_ms
)

Stats and reporting

s = managed.stats()
print(s.p50_ms, s.p95_ms, s.jitter_ms)

managed.print_report()

============================================================
voice-budget Report
============================================================
  Total turns:          47
  Current P50 TTFT:     510ms
  Current P95 TTFT:     780ms
  Target:               800ms
  Budget met:           ✓
  Compressions:         3
  Helpful:              3
  Harmful (rolled back):0
  Total tokens saved:   1,840
  Strategies used:      sliding_window, semantic_trim
============================================================

Why not use existing tools?

Tool	TTFT-aware?	Feedback loop?	Auto-tune?
context-compressor	✗	✗	✗
reme-ai	✗	✗	✗
Pipecat compaction	✗	✗	✗
LangChain SummaryMemory	✗	✗	✗
voice-budget	✓	✓	✓

Contributing

Issues and PRs welcome. See CONTRIBUTING.md.

License

MIT

Releases

When you publish a new release make sure to follow these steps so CI can build and publish to PyPI automatically:

Bump the version in two places:
- pyproject.toml (the version field)
- voice_budget/__init__.py (the __version__ string)
Run the test and lint suite locally:

# Run unit tests
pytest tests/ -v

# Optional: run ruff if installed
ruff check voice_budget/

Commit the version bump and push to the remote repository:

git add pyproject.toml voice_budget/__init__.py
git commit -m "chore(release): bump version x.y.z"
git push origin HEAD

Create a git tag and push it (GitHub Actions will publish on tags that start with v):

# Create an annotated tag
git tag -a vX.Y.Z -m "Release vX.Y.Z"
# Push the tag
git push origin vX.Y.Z

CI (GitHub Actions) will run tests/lint and, on tag pushes, build and publish to PyPI using the PYPI_API_TOKEN secret. Make sure the repository has this secret configured in Settings → Secrets → Actions as PYPI_API_TOKEN before pushing tags.

Notes:

Use semantic versioning (MAJOR.MINOR.PATCH) for tags (for example v0.2.1).
If a tag already exists and you truly need to move it, coordinate with maintainers: force-updating tags that are already published to PyPI is discouraged.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 1, 2026

0.2.4

Mar 19, 2026

0.2.3

Mar 19, 2026

0.2.2

Mar 19, 2026

0.2.1

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_budget-0.3.0.tar.gz (30.6 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_budget-0.3.0-py3-none-any.whl (22.1 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file voice_budget-0.3.0.tar.gz.

File metadata

Download URL: voice_budget-0.3.0.tar.gz
Upload date: Apr 1, 2026
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_budget-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ed9b8e4933d58e9ab51ea26ff13d04de078fab9e4b2478e6acc41bec6d5d1c03`
MD5	`89827b378aee227b988c8b9520dccdc5`
BLAKE2b-256	`93d618b979809dd97d1b502d0a571acdab415f43da1c806df4b9f31e1d186121`

See more details on using hashes here.

File details

Details for the file voice_budget-0.3.0-py3-none-any.whl.

File metadata

Download URL: voice_budget-0.3.0-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_budget-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47420a4feafd00025c1e309213d3f131d8022bc4c6ae720faa98a8c6175b288d`
MD5	`3a605043adf9d8fbf67ee0c8170c29b6`
BLAKE2b-256	`f30aea7dce3bab3c3b10aa2af6cefd6576d5eb17eb6c97de4e236ac8a569ea78`

See more details on using hashes here.

voice-budget 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

voice-budget

Install

Integrations

Framework-agnostic

Pipecat

LiveKit

How it works

Compression strategies (escalating cost)

Configuration

Stats and reporting

Why not use existing tools?

Contributing

License

Releases

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes