Recursive long-context LLM calls in lunchbox shape — cheaper than one big call when the model would drift on long context.
Project description
bentocall
Recursive long-context LLM calls, in lunchbox shape.
A bento box partitions a meal into compartments. A bigger box can nest a smaller one inside it, and a smaller compartment can be carved out of a bigger space. bentocall does the same thing to long-context LLM tasks: split a long document into smaller compartments, run a cheap-and-fast model on each, fold the results back together with deterministic Python. When you'd otherwise pay for one big frontier-model call that drifts on long context, you instead pay for many small specialist calls that don't.
This is a working implementation of the Lambda-RLM algorithm (arXiv:2603.20105) wrapped around the OpenRouter API, with task adapters for two extract-and-aggregate shapes that come up a lot.
What it actually saves you
Empirically, on a 20-task batch validation (2026-05-03):
| Path | Capability hold (≥0.95 / exact) | Total cost | Δ vs Sonnet flat |
|---|---|---|---|
| Sonnet 4.6 flat baseline | 17/20 | $0.94 | — |
| bentocall (Haiku λ-RLM + auto-routed flat fallback) | 20/20 | $0.50 | −47% |
Two task shapes shipped:
ool_pairs— pairwise relation extraction over labelled records (e.g. "users who both posted in target categories"). Routes to Haiku-recursion at ≥4K tokens.aggregate_counts— per-key counts across a long document. Routes to Sonnet-recursion at ≥8K tokens (Haiku has ±1 counting drift on structured records, Sonnet doesn't).
Below the per-task threshold, bentocall transparently calls Sonnet 4.6 flat — same answer, cheaper than recursion overhead.
Install
pip install bentocall
export OPENROUTER_API_KEY=sk-or-v1-...
Use
CLI:
echo "User 1: \"What does NASA stand for?\" [label: abbreviation]
User 2: \"What is freedom?\" [label: description and abstract concept]
User 1: \"What is justice?\" [label: description and abstract concept]" | \
bentocall --task ool_pairs --target "abbreviation,description and abstract concept"
Or as a library:
from bentocall import solve
result = solve(open("long_doc.txt").read(), task="ool_pairs")
print(result["answer"]) # → [[1, 2], ...]
print(result["routing"]) # → "lambda-rlm" or "flat-sonnet"
print(result["trace"]["leaves"]) # → number of LLM calls made
Self-test the install:
bentocall --task ool_pairs --self-test # canned 30-item, expects F1=1.0
bentocall --task aggregate_counts --self-test # canned 60-item, expects rel_err < 0.05
Watch your spend:
bentocall-usage # today
bentocall-usage --week --savings # 7-day with all-Sonnet counterfactual
⚠ The thresholds you're inheriting are NOT yours
The ROUTE_THRESHOLDS = {"ool_pairs": 4000, "aggregate_counts": 8000} defaults were measured on one specific workload distribution at OpenRouter pricing on 2026-05-03. Your task shapes, your input sizes, and current model pricing all shift the cost-inversion point. Re-derive on your data before relying on these in production:
# Run the autoresearch sweep on your workload (~$3, ~30 min)
python -m bentocall.research.sweep
# Inspect the per-cell winner table
cat runs/frontier.json
# Update bentocall/api.py: ROUTE_THRESHOLDS = {...}
If you skip this step and your task mix doesn't look like ours, you may save 0% or even pay 17% more. The 47% number is the upper bound on a specific synthetic; treat it as "the algorithm works" rather than "this is your savings."
Add a new task adapter
Two of the six task shapes from the original paper are implemented. To add a third (search, multi_hop, summarise, s_niah):
- Read
bentocall/tasks/aggregate_counts.pyend to end — it's the template. - New file
bentocall/tasks/<your_task>.pyexposing exactly:generate(...),lambda_rlm(case, model, K),flat_baseline(case, model),score(pred, gold). - Register in
bentocall/api.py: add toSUPPORTED_TASKS, dispatch insolve(), add aROUTE_THRESHOLDSentry (set conservatively until you've measured). - Add a CLI choice in
bentocall/cli.pyand a self-test branch.
PRs that follow this template will be reviewed quickly. PRs that don't will get a one-line redirect.
Maintenance posture
This is a reference implementation, maintained by one person on weekends. Issues and PRs get best-effort responses, no SLA. Adapter PRs that follow the template in CONTRIBUTING.md get merged fast; everything else may not.
If you depend on this in production, fork it. That's the right relationship for a reference impl, and you'll thank yourself when an upstream change you didn't want lands at a bad time.
For questions and "how do I…" use Discussions. Issues are reserved for reproducible bugs and security reports.
License
Apache-2.0. See LICENSE.
Acknowledgements
The recursion shape comes from the Lambda-RLM paper (Roy et al., 2026, arXiv:2603.20105). bentocall is one cloud-backed implementation of that algorithm, plus task adapters and routing logic that are this project's own work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bentocall-0.1.0.tar.gz.
File metadata
- Download URL: bentocall-0.1.0.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
851545e7cf05e240aaf372b63167ecf616fee8769f490e7ddd81b79a74f22668
|
|
| MD5 |
7fc2d0240fac21f5dc78a68684cf9ea5
|
|
| BLAKE2b-256 |
4707860d55ab08cdb01822422e3ad1fb2d24f366ec0f091c4f710b53b9e66489
|
File details
Details for the file bentocall-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bentocall-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e84d7df23e12dc92a6b6187f5070997227163a81dce89977ec0e25ecf4e562fa
|
|
| MD5 |
ff94febae51d913ae3cc3bcc3c515f42
|
|
| BLAKE2b-256 |
37c5186a520a3aa329268aebc57909adb786d9ab72f65bb18b5d8f45dc9ab65e
|