4-dimensional LLM inference benchmark — multi-turn, multi-agent, parallel dispatch with tool calling

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

zenprocess

These details have not been verified by PyPI

Project description

PawBench 🐾

    / \__
   (    @\___    PawBench
  /         O   4-dimensional LLM inference benchmark
 /   (_____/    "More bark than bite"
/_____/   U

Because your model deserves a benchmark with more bark than bite.

4-dimensional LLM inference benchmark for OpenAI-compatible endpoints. Multi-turn, multi-agent, parallel dispatch with tool calling.

Tests your model with realistic coding agent workloads — not synthetic single-turn completions.

Meet Lola

PawBench is inspired by Lola (@_justlolathings) — the most fashionable pup on Instagram. The built-in scenarios revolve around building her boutique dog apparel store, PawStyle by Lola. Every product, every size guide, every "Lola's Pick" badge traces back to this style icon on four legs.

Follow Lola: https://www.instagram.com/_justlolathings/

Install

pip install pawbench
# or
uv pip install pawbench

Quick Start

# Benchmark your local vLLM
pawbench --endpoint http://localhost:8000

# Against any OpenAI-compatible endpoint
pawbench --endpoint https://api.openai.com/v1 --tag gpt4o

# Just throughput saturation (no scenarios)
pawbench --saturation-only --concurrency 1,2,4,8,16

# JSON output for CI/autoresearch
pawbench --json --output results/

# Custom scenario
pawbench --scenario my_scenario.json

What It Measures

4 Dimensions

Dimension	Metrics
Throughput	Single-agent tok/s, parallel saturation curve (1->N), TTFT, peak concurrency
Quality	Tool call accuracy, instruction following, format compliance, keyword matching
Efficiency	Useful token ratio (code in tool args vs filler preamble), tokens per turn
Adaptability	Steering event response, mid-conversation context injection, nudge quality delta

Built-in Scenarios: PawStyle by Lola

Two parallel agents build Lola's boutique dog apparel e-commerce store — "Where every pup is a fashionista":

pawstyle-independent — Frontend and backend work independently on Lola's shop. Pure parallel throughput + quality baseline.
pawstyle — Backend gets a steering event mid-task ("frontend added a Size Guide button — implement Lola's breed-specific sizing endpoint").
pawstyle-nudge — Frontend adds Lola's Favorites (wishlist) and Compare features that require backend changes. Backend receives nudges and adapts.

Each scenario is 3 turns x 2 agents, with tool calls (write_file, read_file, run_command) and injected tool results. Products include Lola's Signature Bandana, Cozy Knit Sweater, Rainy Day Raincoat, Adventure Booties, Dapper Bow Tie, and Walk-in-Style Harness — with "Lola's Pick" badges on her personal favorites.

Server Metrics (optional)

If the endpoint exposes /metrics (vLLM, TGI), PawBench scrapes:

KV cache usage and prefix cache hit rate
Speculative decoding acceptance rate
GPU cache pressure

Custom Scenarios

Scenarios are JSON files:

{
  "id": "my-scenario",
  "name": "My Custom Scenario",
  "agents": [
    {
      "id": "agent-1",
      "name": "My Agent",
      "turns": [
        {
          "turn": 1,
          "role": "user",
          "content": "Build a REST API with Flask...",
          "tools": ["write_file"],
          "expect": {
            "tool_calls_min": 1,
            "tool_name_any": ["write_file"],
            "output_mentions": ["flask", "api"]
          }
        }
      ]
    }
  ],
  "tools_schema": [...]
}

Comparing Configs

pawbench --tag baseline --output results/
# ... change model config ...
pawbench --tag eagle3 --output results/

python -m pawbench.compare results/pawbench_baseline_*.json results/pawbench_eagle3_*.json

Output Format

JSON results include full model card (architecture, quantization, GPU, serving params) for reproducibility:

{
  "tag": "fp8-eagle3-spec3",
  "model_card": {
    "model_name": "qwen3-coder",
    "model_config": {"architectures": ["Qwen3NextForCausalLM"], "num_experts": 512, "...": "..."},
    "tuning": {"kv_cache_dtype": "fp8_e4m3", "speculative_config": "eagle3", "...": "..."},
    "gpu": {"name": "NVIDIA GB10", "...": "..."}
  },
  "dim1_throughput": {"avg_single_tok_s": 69.0, "raw_peak_tok_s": 469.3, "...": "..."},
  "dim2_quality": {"avg_quality": 0.81, "tool_accuracy": 0.96, "...": "..."},
  "saturation_curve": [{"concurrency": 1, "tok_s": 69.3}, {"concurrency": 8, "tok_s": 469.3}],
  "server_metrics": {"spec_acceptance_rate": 0.72, "gpu_prefix_cache_hit_rate": 0.92}
}

License

MIT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

zenprocess

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.4

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pawbench-1.1.4.tar.gz (67.9 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pawbench-1.1.4-py3-none-any.whl (57.8 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file pawbench-1.1.4.tar.gz.

File metadata

Download URL: pawbench-1.1.4.tar.gz
Upload date: Mar 27, 2026
Size: 67.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pawbench-1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`88c063b3bd8a7047edf42b06d9dc7d49aa34606af07a0f175c3b5991d01f154d`
MD5	`cbf8d690c253270e02140057cc66c1b4`
BLAKE2b-256	`c676c1505c8003f38db2075db14dd85d90b20c897b249386fa793d81750206a3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pawbench-1.1.4.tar.gz:

Publisher: release.yml on zenprocess/pawbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pawbench-1.1.4.tar.gz
- Subject digest: 88c063b3bd8a7047edf42b06d9dc7d49aa34606af07a0f175c3b5991d01f154d
- Sigstore transparency entry: 1187031376
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: zenprocess/pawbench@da48ab250416ef596b59409aa43c6c5266b9e1de
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/zenprocess
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@da48ab250416ef596b59409aa43c6c5266b9e1de
- Trigger Event: push

File details

Details for the file pawbench-1.1.4-py3-none-any.whl.

File metadata

Download URL: pawbench-1.1.4-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 57.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pawbench-1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c67c930c77ae796bf6481d1007e6ad73d85f6ac54098bae1508ef463cd5e36cb`
MD5	`3274612c3423eee89a1f34f71d07f1a1`
BLAKE2b-256	`e6ff5745a1e31293325d89169c93e99116bfc4e401996cee436d11af92a31c38`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pawbench-1.1.4-py3-none-any.whl:

Publisher: release.yml on zenprocess/pawbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pawbench-1.1.4-py3-none-any.whl
- Subject digest: c67c930c77ae796bf6481d1007e6ad73d85f6ac54098bae1508ef463cd5e36cb
- Sigstore transparency entry: 1187031395
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: zenprocess/pawbench@da48ab250416ef596b59409aa43c6c5266b9e1de
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/zenprocess
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@da48ab250416ef596b59409aa43c6c5266b9e1de
- Trigger Event: push

pawbench 1.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PawBench 🐾

Meet Lola

Install

Quick Start

What It Measures

4 Dimensions

Built-in Scenarios: PawStyle by Lola

Server Metrics (optional)

Custom Scenarios

Comparing Configs

Output Format

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance