Claude-native token optimization for agentic pipelines
Project description
Beskar
Claude-native token optimization for agentic pipelines.
Beskar wraps the Anthropic SDK to automatically cut token costs in production agentic loops — through intelligent prompt caching, context pruning, tool result compression, and a metrics layer that proves what you're saving.
The Problem
Agentic pipelines burn tokens in predictable, fixable ways:
- The same system prompt and tool definitions are re-tokenized on every turn
- Long conversation histories accumulate with no pruning strategy
- Tool results carry full verbose payloads even after they're no longer useful
- Extended thinking mode can spike costs unpredictably
- Sonnet runs tasks that Haiku could handle at 1/20th the price
In a high-volume pipeline — bug bounty tooling, research agents, code generation loops — this waste compounds fast. Beskar fixes it at the SDK layer, before your application code sees it.
V1 Features
Prompt Caching Auto-Structurer
Automatically places cache_control breakpoints at the optimal positions in your messages array. Respects Claude's minimum token thresholds (1024 tokens for Sonnet/Opus, 2048 for Haiku), honors the 4-breakpoint limit, and prioritizes stable content (system prompt, tool definitions, long context documents) over dynamic content. Cache hits cut input costs by ~90% on the cached portion.
Context Window Pruner
Manages rolling context for long agentic loops. Configurable strategies: sliding window (drop oldest turns), summarization (collapse old turns into a summary message), or importance scoring. Preserves tool call integrity — never drops a tool_use turn without its corresponding tool_result.
Tool Result Compressor
Intercepts tool results before they're appended to context. Strips non-essential fields, truncates oversized payloads, and collapses completed tool call chains into summarized form once they're no longer needed for active reasoning. Preserves tool_use_id linkage throughout.
Token Metrics Layer
Wraps every API call to capture the usage object from Claude's response. Tracks input tokens, output tokens, cache creation tokens, and cache read tokens. Derives cache hit rate, estimated cost (by model), and tokens saved vs. an uncached baseline. Makes optimization measurable.
Installation
TypeScript / Node.js
npm install beskar
Requires Node 18+. The anthropic SDK is a peer dependency — install it alongside:
npm install beskar @anthropic-ai/sdk
Python
Note: The Python package is not yet published to PyPI. Install locally from source:
git clone https://github.com/Vligai/beskar.git
cd beskar/python
pip install -e .
Requires Python 3.9+. The anthropic SDK is pulled in automatically as a dependency.
For development (tests + type checking):
pip install -e ".[dev]"
Usage
Beskar is a drop-in replacement for anthropic.messages.create().
TypeScript
import { BeskarClient } from 'beskar';
const client = new BeskarClient({
apiKey: process.env.ANTHROPIC_API_KEY,
cache: { enabled: true },
pruner: { strategy: 'sliding-window', maxTurns: 20 },
compressor: { enabled: true, maxToolResultTokens: 500 },
metrics: { enabled: true },
});
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: 'You are a security researcher...',
messages: conversationHistory,
});
console.log(client.metrics.summary());
// { cacheHitRate: 0.87, estimatedCost: '$0.0023', tokensSaved: 14200 }
Python
import os
from beskar import BeskarClient
from beskar.types import BeskarConfig, CacheConfig, PrunerConfig, CompressorConfig, MetricsConfig
client = BeskarClient(BeskarConfig(
api_key=os.environ["ANTHROPIC_API_KEY"],
cache=CacheConfig(),
pruner=PrunerConfig(strategy="sliding-window", max_turns=20),
compressor=CompressorConfig(max_tool_result_tokens=500),
metrics=MetricsConfig(),
))
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a security researcher...",
messages=conversation_history,
)
print(client.metrics.summary())
# MetricsSummary(cache_hit_rate=0.87, estimated_cost_usd=0.0023, estimated_savings_usd=0.012, ...)
Roadmap
V2 — After V1 ships with real usage data:
- Extended thinking budget control — Per-call thinking budgets with task-complexity detection to toggle thinking contextually rather than globally
- Model routing — Automatically route subtasks to Haiku vs. Sonnet based on complexity signals (extraction vs. reasoning, output length, tool depth)
- Output filler cleanup — Post-processor trained on Claude's characteristic filler patterns ("Certainly!", disclaimer stacking) for reliable cleanup
- System prompt auditor — Scores and rewrites system prompts for token efficiency without degrading behavior
- Model-agnostic support — Abstract the Claude-specific layer to support other providers in V2
License
MIT © 2026 Vlad Ligai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beskar-0.1.0a1.tar.gz.
File metadata
- Download URL: beskar-0.1.0a1.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13d3e55731121517c34e8466a68e8f3d55a45d4d1ea5cd1271a85d418691b2ee
|
|
| MD5 |
90d4fc7ce2c65419b5b62a3e51ebc391
|
|
| BLAKE2b-256 |
3467ae6c0e68d4e6a7b1c42436600b2bd7df5b4ee71e2940c463d04e222ba8ac
|
Provenance
The following attestation bundles were made for beskar-0.1.0a1.tar.gz:
Publisher:
publish.yml on Vligai/beskar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beskar-0.1.0a1.tar.gz -
Subject digest:
13d3e55731121517c34e8466a68e8f3d55a45d4d1ea5cd1271a85d418691b2ee - Sigstore transparency entry: 984120555
- Sigstore integration time:
-
Permalink:
Vligai/beskar@117d7e95babdc07aa03be4d426871094e956b1ac -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/Vligai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@117d7e95babdc07aa03be4d426871094e956b1ac -
Trigger Event:
push
-
Statement type:
File details
Details for the file beskar-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: beskar-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88ff35b2e44ab113c1552ef5305ed48ce7b242e61acff63d0a9c83b644475d48
|
|
| MD5 |
b3dc650deb5d7e6f5317f3ee3d2717ee
|
|
| BLAKE2b-256 |
79c3f14e616ef518017ced61265c76568ba6e2aa5a082134f7fe0a9660b66631
|
Provenance
The following attestation bundles were made for beskar-0.1.0a1-py3-none-any.whl:
Publisher:
publish.yml on Vligai/beskar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beskar-0.1.0a1-py3-none-any.whl -
Subject digest:
88ff35b2e44ab113c1552ef5305ed48ce7b242e61acff63d0a9c83b644475d48 - Sigstore transparency entry: 984120593
- Sigstore integration time:
-
Permalink:
Vligai/beskar@117d7e95babdc07aa03be4d426871094e956b1ac -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/Vligai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@117d7e95babdc07aa03be4d426871094e956b1ac -
Trigger Event:
push
-
Statement type: