Cost-Sustainable Concurrent Execution for Long-Horizon LLM Agents
Project description
SlowBurn: Cost-Sustainable Concurrent Execution for Long-Horizon LLM Agents
Authors: Abhishek Divekar
Click the image above to watch the demo video
Overview
Long-horizon LLM agents (autonomous coding assistants, deep research pipelines, multi-agent simulations) issue dozens to hundreds of API calls per task. Existing tools either passively monitor spending, or hard-terminate the agent when a budget cap is reached, discarding accumulated context.
SlowBurn takes a different approach: when the budget is exhausted, the agent pauses rather than crashes. Budget exhaustion becomes a flow-control signal (backpressure), not a fatal error. The agent sleeps until the rate-limit window refills, then resumes exactly where it left off with no context loss.
What SlowBurn provides:
- CostLimit: a dollar-denominated rate limit that composes with token and request rate limits, and blocks rather than terminates when exhausted
- SlowBurnLLM: an asyncio LLM worker with automatic per-call cost tracking, supporting 100+ models via litellm (text and vision)
- Framework integrations: drop-in hooks for CrewAI, AutoGen (AG2), LangGraph, and LangChain that share a unified budget
- CostReporter: per-call, per-model cost attribution with JSON, Markdown, and LaTeX export
- Global config: all defaults centralized in
slowburn_config, overridable at runtime viatemp_config()
Quick Start
Create a cost-controlled LLM worker with a daily dollar budget, make calls, and inspect the cost report:
from slowburn import create_llm
# Create a cost-controlled LLM worker: $5 daily budget, asyncio execution
llm = create_llm(model="gpt-4o-mini", budget_usd=5.0, window="daily")
# Make LLM calls (concurrent on the asyncio event loop)
result = llm.call_llm(prompt="Summarize this paper...").result()
# Check costs
reporter = llm.get_reporter().result()
print(f"Cost: ${reporter.total_cost():.4f}")
print(reporter.to_markdown())
llm.stop()
Vision-Language Agents
Pass local files, URLs, or data-URLs as images for multimodal (VLM) calls:
from pathlib import Path
result = llm.call_llm(
prompt="Describe this image in detail.",
images=[Path("photo.jpg")], # local files, URLs, or data-URLs
image_detail="high",
).result()
Batch calls (concurrent)
Send multiple prompts in one call; they execute concurrently on the asyncio event loop under the same budget:
results = llm.call_llm_batch(
prompts=["Capital of France?", "Capital of Japan?", "Capital of Brazil?"],
).result()
# All 3 execute concurrently on the event loop
Structured output with validators
Attach a validator function to parse and type-check the response; ValueError triggers an automatic retry:
import re
def extract_number(text: str) -> int:
match = re.search(r"\d+", text)
if match is None:
raise ValueError(f"No number found: {text!r}") # triggers retry
return int(match.group())
answer = llm.call_llm(
prompt="What is 17 * 3? Reply with just the number.",
validator=extract_number, # retries automatically on ValueError
).result()
# answer = 51 (int, not str)
Global configuration
Override defaults (temperature, budget, timeouts) for a specific run using a context manager that restores on exit:
from slowburn import slowburn_config, temp_config
# Inspect defaults
print(slowburn_config.defaults.temperature) # 0.7
print(slowburn_config.defaults.budget_usd) # 5.0
# Override for a specific run (restores on exit)
with temp_config(temperature=0.0, budget_usd=0.10):
llm = create_llm(model="gpt-4o-mini")
# temperature=0.0, budget_usd=$0.10
Framework Integrations
SlowBurn provides drop-in hooks that add backpressure-based budget enforcement to existing agent frameworks. Each hook intercepts LLM calls at the framework's extension point and routes them through a shared limit set.
AutoGen (AG2)
from slowburn.integrations.autogen import SlowBurnModelClient
assistant.register_model_client(
model_client_cls=SlowBurnModelClient,
limit_set=limit_set,
reporter=reporter,
)
CrewAI
from slowburn.integrations.crewai import SlowBurnCrewAI
sb = SlowBurnCrewAI(budget_usd=5.0, max_tokens=1000)
sb.install()
crew.kickoff()
print(sb.reporter.to_markdown())
LangGraph
from slowburn.integrations.langgraph import SlowBurnMiddleware
budget = SlowBurnMiddleware(budget_usd=5.0)
agent = create_agent(model="openai:gpt-4o-mini", middleware=[budget])
LangChain
from slowburn.integrations.langchain import SlowBurnCallbackHandler
handler = SlowBurnCallbackHandler(budget_usd=5.0)
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler])
Case Study: Autonomous Code Improvement Agent
We deployed a ReAct agent that reads Python code, searches the web for best practices, writes improved code, and iterates three times, with every LLM call routed through SlowBurn with a $0.02-per-30s budget window.
| Iteration | Calls | Input Tokens | Output Tokens | Cost |
|---|---|---|---|---|
| 1: Best practices | 9 | 25K | 3K | $0.02 |
| 2: Type hints | 15 | 68K | 9K | $0.04 |
| 3: Edge cases | 15 | 62K | 7K | $0.03 |
| Total | 39 | 155K | 19K | $0.09 |
Between iterations, backpressure paused the agent for ~18 seconds until the budget window refilled. Execution resumed with no loss of context.
Comparison with Alternatives
| Feature | SlowBurn | AgentBudget | LiteLLM | Langfuse | Prompto |
|---|---|---|---|---|---|
| Budget exhaustion | Pauses | Terminates | Terminates | --- | --- |
| Concurrent execution | Asyncio | --- | --- | --- | Async |
| Cost tracking | Per-call | Session | Per-key | Trace | --- |
| Dollar rate limit | Yes | --- | --- | --- | --- |
| Framework hooks | 4 | 2 | Proxy | Many | --- |
| Infrastructure | Zero | Zero | Proxy | Server | Zero |
| Paper-ready export | Markdown + LaTeX | --- | --- | --- | --- |
Project Structure
slowburn/
├── src/slowburn/
│ ├── __init__.py # create_llm() entry point
│ ├── config.py # SlowBurnConfig, temp_config(), _NO_ARG sentinel
│ ├── llm_worker.py # SlowBurnLLM asyncio worker (text + vision)
│ ├── cost_accounting.py # estimate_input_tokens(), cost_controlled_call()
│ ├── limits.py # CostLimit (dollar-denominated rate limit)
│ ├── pricing.py # PricingCache (litellm + OpenRouter pricing)
│ ├── reporter.py # CostReporter (JSON, Markdown, LaTeX export)
│ ├── backpressure.py # Backpressure warning logging
│ └── integrations/
│ ├── autogen.py # AutoGen (AG2) ModelClient
│ ├── crewai.py # CrewAI event bus / hooks middleware
│ ├── langchain.py # LangChain callback handler
│ └── langgraph.py # LangGraph agent middleware
├── demos/
│ ├── Demo.ipynb # Interactive demo notebook
│ ├── demo_native_research_agent.py # Research agent with web search
│ ├── demo_native_code_agent.py # Code improvement agent
│ ├── demo_crewai_research_team.py # CrewAI multi-agent demo
│ ├── demo_autogen_debate.py # AutoGen debate demo
│ ├── demo_langchain_reflection.py # LangChain chain demo
│ └── demo_langgraph_plan_execute.py # LangGraph agent demo
└── README.md
Installation
pip install slowburn
With framework integrations:
pip install "slowburn[crewai]" # CrewAI
pip install "slowburn[autogen]" # AutoGen (AG2)
pip install "slowburn[langgraph]" # LangGraph
pip install "slowburn[langchain]" # LangChain
Everything:
pip install "slowburn[all]"
From source (development)
git clone https://github.com/adivekar-utexas/slowburn.git
cd slowburn
pip install -e ".[dev]"
# Set API key
cp .env.example .env
# Edit .env with your OPENROUTER_API_KEY or OPENAI_API_KEY
Running tests
# Unit tests (mocked, no API key needed)
pytest tests/ --ignore=tests/test_e2e_real_llm.py --ignore=tests/test_e2e_vision.py -v
# Full suite including real LLM calls (requires API key in .env)
pytest tests/ -v --timeout=120
Running demos
# Interactive notebook
jupyter notebook demos/Demo.ipynb
# Research agent (terminal)
cd demos && python demo_native_research_agent.py
# Code improvement agent (terminal)
cd demos && python demo_native_code_agent.py
Citation
If you use SlowBurn in your research, please cite:
@misc{divekar2026slowburn,
author = {Divekar, Abhishek},
title = {{SlowBurn}: Cost-Sustainable Concurrent Execution for Long-Horizon {LLM} Agents},
year = {2026},
howpublished = {\url{https://github.com/adivekar-utexas/slowburn}},
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slowburn-0.4.0.tar.gz.
File metadata
- Download URL: slowburn-0.4.0.tar.gz
- Upload date:
- Size: 744.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0f07afc93bd9ca6ff620df2db8756d6ecd62a2f9cfe395984ae39a8dff7925f
|
|
| MD5 |
8b9dc540ed310fff29ff6630a510b59e
|
|
| BLAKE2b-256 |
2403dcd01aeea87e97319cea866eb9247f2afdd152cc251814980f0064ccbe6d
|
File details
Details for the file slowburn-0.4.0-py3-none-any.whl.
File metadata
- Download URL: slowburn-0.4.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39914343b3c214b9b29bc9ec65e041ece5ed817aaeff503f5a5f3301de939e73
|
|
| MD5 |
17b757721f4cfb0dfece855d084e26e6
|
|
| BLAKE2b-256 |
56339e3c18cfecbd91f1819db1523f0c1e06ea5bd8b27205c86383c421f40d6d
|