Skip to main content

Recursive long-context LLM calls in lunchbox shape — cheaper than one big call when the model would drift on long context.

Project description

bentocall

Recursive long-context LLM calls, in lunchbox shape.

A bento box partitions a meal into compartments. A bigger box can nest a smaller one inside it, and a smaller compartment can be carved out of a bigger space. bentocall does the same thing to long-context LLM tasks: split a long document into smaller compartments, run a cheap-and-fast model on each, fold the results back together with deterministic Python. When you'd otherwise pay for one big frontier-model call that drifts on long context, you instead pay for many small specialist calls that don't.

This is a working implementation of the Lambda-RLM algorithm (arXiv:2603.20105) wrapped around the OpenRouter API, with task adapters for two extract-and-aggregate shapes that come up a lot.

What it actually saves you

Empirically, on a 20-task batch validation (2026-05-03):

Path Capability hold (≥0.95 / exact) Total cost Δ vs Sonnet flat
Sonnet 4.6 flat baseline 17/20 $0.94
bentocall (Haiku λ-RLM + auto-routed flat fallback) 20/20 $0.50 −47%

Two task shapes shipped:

  • ool_pairs — pairwise relation extraction over labelled records (e.g. "users who both posted in target categories"). Routes to Haiku-recursion at ≥4K tokens.
  • aggregate_counts — per-key counts across a long document. Routes to Sonnet-recursion at ≥8K tokens (Haiku has ±1 counting drift on structured records, Sonnet doesn't).

Below the per-task threshold, bentocall transparently calls Sonnet 4.6 flat — same answer, cheaper than recursion overhead.

Install

pip install bentocall
export OPENROUTER_API_KEY=sk-or-v1-...

Use

CLI:

echo "User 1: \"What does NASA stand for?\" [label: abbreviation]
User 2: \"What is freedom?\" [label: description and abstract concept]
User 1: \"What is justice?\" [label: description and abstract concept]" | \
  bentocall --task ool_pairs --target "abbreviation,description and abstract concept"

Or as a library:

from bentocall import solve

result = solve(open("long_doc.txt").read(), task="ool_pairs")
print(result["answer"])              # → [[1, 2], ...]
print(result["routing"])             # → "lambda-rlm" or "flat-sonnet"
print(result["trace"]["leaves"])     # → number of LLM calls made

Self-test the install:

bentocall --task ool_pairs --self-test         # canned 30-item, expects F1=1.0
bentocall --task aggregate_counts --self-test  # canned 60-item, expects rel_err < 0.05

Watch your spend:

bentocall-usage              # today
bentocall-usage --week --savings   # 7-day with all-Sonnet counterfactual

⚠ The thresholds you're inheriting are NOT yours

The ROUTE_THRESHOLDS = {"ool_pairs": 4000, "aggregate_counts": 8000} defaults were measured on one specific workload distribution at OpenRouter pricing on 2026-05-03. Your task shapes, your input sizes, and current model pricing all shift the cost-inversion point. Re-derive on your data before relying on these in production:

# Run the autoresearch sweep on your workload (~$3, ~30 min)
python -m bentocall.research.sweep
# Inspect the per-cell winner table
cat runs/frontier.json
# Update bentocall/api.py: ROUTE_THRESHOLDS = {...}

If you skip this step and your task mix doesn't look like ours, you may save 0% or even pay 17% more. The 47% number is the upper bound on a specific synthetic; treat it as "the algorithm works" rather than "this is your savings."

Add a new task adapter

Two of the six task shapes from the original paper are implemented. To add a third (search, multi_hop, summarise, s_niah):

  1. Read bentocall/tasks/aggregate_counts.py end to end — it's the template.
  2. New file bentocall/tasks/<your_task>.py exposing exactly: generate(...), lambda_rlm(case, model, K), flat_baseline(case, model), score(pred, gold).
  3. Register in bentocall/api.py: add to SUPPORTED_TASKS, dispatch in solve(), add a ROUTE_THRESHOLDS entry (set conservatively until you've measured).
  4. Add a CLI choice in bentocall/cli.py and a self-test branch.

PRs that follow this template will be reviewed quickly. PRs that don't will get a one-line redirect.

Maintenance posture

This is a reference implementation, maintained by one person on weekends. Issues and PRs get best-effort responses, no SLA. Adapter PRs that follow the template in CONTRIBUTING.md get merged fast; everything else may not.

If you depend on this in production, fork it. That's the right relationship for a reference impl, and you'll thank yourself when an upstream change you didn't want lands at a bad time.

For questions and "how do I…" use Discussions. Issues are reserved for reproducible bugs and security reports.

License

Apache-2.0. See LICENSE.

Acknowledgements

The recursion shape comes from the Lambda-RLM paper (Roy et al., 2026, arXiv:2603.20105). bentocall is one cloud-backed implementation of that algorithm, plus task adapters and routing logic that are this project's own work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentocall-0.1.0.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bentocall-0.1.0-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file bentocall-0.1.0.tar.gz.

File metadata

  • Download URL: bentocall-0.1.0.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for bentocall-0.1.0.tar.gz
Algorithm Hash digest
SHA256 851545e7cf05e240aaf372b63167ecf616fee8769f490e7ddd81b79a74f22668
MD5 7fc2d0240fac21f5dc78a68684cf9ea5
BLAKE2b-256 4707860d55ab08cdb01822422e3ad1fb2d24f366ec0f091c4f710b53b9e66489

See more details on using hashes here.

File details

Details for the file bentocall-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bentocall-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for bentocall-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e84d7df23e12dc92a6b6187f5070997227163a81dce89977ec0e25ecf4e562fa
MD5 ff94febae51d913ae3cc3bcc3c515f42
BLAKE2b-256 37c5186a520a3aa329268aebc57909adb786d9ab72f65bb18b5d8f45dc9ab65e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page