Recursive long-context LLM calls in lunchbox shape — cheaper than one big call when the model would drift on long context.

These details have not been verified by PyPI

Project links

Project description

bentocall

Recursive long-context LLM calls, in lunchbox shape.

A bento box partitions a meal into compartments. A bigger box can nest a smaller one inside it, and a smaller compartment can be carved out of a bigger space. bentocall does the same thing to long-context LLM tasks: split a long document into smaller compartments, run a cheap-and-fast model on each, fold the results back together with deterministic Python. When you'd otherwise pay for one big frontier-model call that drifts on long context, you instead pay for many small specialist calls that don't.

This is a working implementation of the Lambda-RLM algorithm (arXiv:2603.20105) wrapped around the OpenRouter API, with task adapters for two extract-and-aggregate shapes that come up a lot.

What it actually saves you

Empirically, on a 20-task batch validation (2026-05-03):

Path	Capability hold (≥0.95 / exact)	Total cost	Δ vs Sonnet flat
Sonnet 4.6 flat baseline	17/20	$0.94	—
bentocall (Haiku λ-RLM + auto-routed flat fallback)	20/20	$0.50	−47%

Two task shapes shipped:

ool_pairs — pairwise relation extraction over labelled records (e.g. "users who both posted in target categories"). Routes to Haiku-recursion at ≥4K tokens.
aggregate_counts — per-key counts across a long document. Routes to Sonnet-recursion at ≥8K tokens (Haiku has ±1 counting drift on structured records, Sonnet doesn't).

Below the per-task threshold, bentocall transparently calls Sonnet 4.6 flat — same answer, cheaper than recursion overhead.

Install

pip install bentocall
export OPENROUTER_API_KEY=sk-or-v1-...

Use

CLI:

echo "User 1: \"What does NASA stand for?\" [label: abbreviation]
User 2: \"What is freedom?\" [label: description and abstract concept]
User 1: \"What is justice?\" [label: description and abstract concept]" | \
  bentocall --task ool_pairs --target "abbreviation,description and abstract concept"

Or as a library:

from bentocall import solve

result = solve(open("long_doc.txt").read(), task="ool_pairs")
print(result["answer"])              # → [[1, 2], ...]
print(result["routing"])             # → "lambda-rlm" or "flat-sonnet"
print(result["trace"]["leaves"])     # → number of LLM calls made

Self-test the install:

bentocall --task ool_pairs --self-test         # canned 30-item, expects F1=1.0
bentocall --task aggregate_counts --self-test  # canned 60-item, expects rel_err < 0.05

Watch your spend:

bentocall-usage              # today
bentocall-usage --week --savings   # 7-day with all-Sonnet counterfactual

⚠ The thresholds you're inheriting are NOT yours

The ROUTE_THRESHOLDS = {"ool_pairs": 4000, "aggregate_counts": 8000} defaults were measured on one specific workload distribution at OpenRouter pricing on 2026-05-03. Your task shapes, your input sizes, and current model pricing all shift the cost-inversion point. Re-derive on your data before relying on these in production:

# Run the autoresearch sweep on your workload (~$3, ~30 min)
python -m bentocall.research.sweep
# Inspect the per-cell winner table
cat runs/frontier.json
# Update bentocall/api.py: ROUTE_THRESHOLDS = {...}

If you skip this step and your task mix doesn't look like ours, you may save 0% or even pay 17% more. The 47% number is the upper bound on a specific synthetic; treat it as "the algorithm works" rather than "this is your savings."

Add a new task adapter

Two of the six task shapes from the original paper are implemented. To add a third (search, multi_hop, summarise, s_niah):

Read bentocall/tasks/aggregate_counts.py end to end — it's the template.
New file bentocall/tasks/<your_task>.py exposing exactly: generate(...), lambda_rlm(case, model, K), flat_baseline(case, model), score(pred, gold).
Register in bentocall/api.py: add to SUPPORTED_TASKS, dispatch in solve(), add a ROUTE_THRESHOLDS entry (set conservatively until you've measured).
Add a CLI choice in bentocall/cli.py and a self-test branch.

PRs that follow this template will be reviewed quickly. PRs that don't will get a one-line redirect.

Maintenance posture

This is a reference implementation, maintained by one person on weekends. Issues and PRs get best-effort responses, no SLA. Adapter PRs that follow the template in CONTRIBUTING.md get merged fast; everything else may not.

If you depend on this in production, fork it. That's the right relationship for a reference impl, and you'll thank yourself when an upstream change you didn't want lands at a bad time.

For questions and "how do I…" use Discussions. Issues are reserved for reproducible bugs and security reports.

License

Apache-2.0. See LICENSE.

Acknowledgements

The recursion shape comes from the Lambda-RLM paper (Roy et al., 2026, arXiv:2603.20105). bentocall is one cloud-backed implementation of that algorithm, plus task adapters and routing logic that are this project's own work.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 4, 2026

This version

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentocall-0.1.0.tar.gz (24.0 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bentocall-0.1.0-py3-none-any.whl (26.4 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file bentocall-0.1.0.tar.gz.

File metadata

Download URL: bentocall-0.1.0.tar.gz
Upload date: May 3, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for bentocall-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`851545e7cf05e240aaf372b63167ecf616fee8769f490e7ddd81b79a74f22668`
MD5	`7fc2d0240fac21f5dc78a68684cf9ea5`
BLAKE2b-256	`4707860d55ab08cdb01822422e3ad1fb2d24f366ec0f091c4f710b53b9e66489`

See more details on using hashes here.

File details

Details for the file bentocall-0.1.0-py3-none-any.whl.

File metadata

Download URL: bentocall-0.1.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 26.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for bentocall-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e84d7df23e12dc92a6b6187f5070997227163a81dce89977ec0e25ecf4e562fa`
MD5	`ff94febae51d913ae3cc3bcc3c515f42`
BLAKE2b-256	`37c5186a520a3aa329268aebc57909adb786d9ab72f65bb18b5d8f45dc9ab65e`

See more details on using hashes here.

bentocall 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bentocall

What it actually saves you

Install

Use

⚠ The thresholds you're inheriting are NOT yours

Add a new task adapter

Maintenance posture

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes