Skip to main content

Production-grade boto3 toolkit for AWS Bedrock: typed retry, per-model timeouts, capability lookup, full token usage with cache fields, PII-safe Guardrails.

Project description

bedrock-ops

Production-grade boto3 toolkit for AWS Bedrock. Closes the gaps every team rebuilds when running Bedrock in production.

pip install bedrock-ops
from bedrock_ops import BedrockClient

client = BedrockClient(region_name="us-east-1")
resp = client.converse(
    modelId="anthropic.claude-sonnet-4-20250514-v1:0",
    messages=[{"role": "user", "content": [{"text": "hello"}]}],
)
print(resp.text)
print(f"cache hit rate: {resp.usage.cache_hit_rate:.1%}")

What it fixes

Gap How bedrock-ops fixes it Upstream issue
Lowercase throttlingException not retried by botocore (case-sensitive match) Installs an after-call hook that normalizes throttle codes to ThrottlingException so retries fire strands-agents#905
boto3 default 60s read timeout truncates long Sonnet 4 calls Defaults to 120s; configurable per BedrockClient mem0#3825
cacheReadInputTokens / cacheWriteInputTokens dropped by wrappers; can't measure cache hit rate TokenUsage carries all four fields plus cache_hit_rate property strands#529
ReadTimeoutError dumps full traceback instead of typed catchable error Wraps in BedrockTimeout(kind="read", elapsed_s=...) boto3#4561
EventStreamError from streaming throttles leaks connections from the pool Wraps and ensures connection release boto3#4543 (closed-not-planned)
No programmatic Bedrock model maxTokens lookup capabilities("...") returns a typed ModelCapabilities boto3#4206
Bedrock Guardrails leak the violating PII into logs because the response carries the matched content safe_log_response() returns a redacted copy; BedrockGuardrailViolation carries categories but no content litellm#12152
guardrail_redact_input=True orphans tool_use blocks; next turn fails validation repair_orphan_tool_uses() drops the orphans, restoring valid history strands#1077

Why not langchain-aws or strands-agents?

  • langchain-aws is coupled to LangChain runnables. You adopt the chain abstraction whether you want it or not.
  • strands-agents is an agent framework. You adopt the loop, tool definition, and orchestration model.
  • bedrock-ops is a thin functional toolkit on top of boto3.client('bedrock-runtime'). No chains, no agents. Use it from a Lambda, a FastAPI handler, a Glue job, or inside any framework that already has its own opinions.

Install

Requires Python 3.10+. Pulls boto3>=1.35 (already in your AWS Python projects).

pip install bedrock-ops
# or
uv add bedrock-ops

Usage

Production client (case-insensitive throttle retry, typed errors, full usage)

from bedrock_ops import (
    BedrockClient, BedrockThrottled, BedrockTimeout, BedrockValidationError,
)

client = BedrockClient(
    region_name="us-east-1",
    max_attempts=5,           # default 5
    retry_mode="adaptive",    # also: "standard", "legacy"
    connect_timeout=10.0,     # default 10s
    read_timeout=120.0,       # default 120s — bedrock long-context safe
)

try:
    resp = client.converse(
        modelId="anthropic.claude-sonnet-4-20250514-v1:0",
        messages=[{"role": "user", "content": [{"text": "summarize this..."}]}],
        system=[{"text": "you are concise"}],
        inferenceConfig={"maxTokens": 1024, "temperature": 0.2},
    )
except BedrockThrottled as e:
    log.warning("throttled after %s attempts in %s", e.attempts, e.region)
except BedrockTimeout as e:
    log.warning("bedrock %s timeout after %.1fs", e.kind, e.elapsed_s)
except BedrockValidationError as e:
    log.error("invalid request to %s: %s", e.model_id, e)

# Full usage including cache fields
print(resp.usage.input_tokens, resp.usage.cache_read_input_tokens)
print(f"cache hit rate this call: {resp.usage.cache_hit_rate:.1%}")
print(f"latency: {resp.latency_ms} ms")

# Tool calls if any
for tu in resp.tool_uses:
    print(tu["name"], tu["input"])

Streaming with cache-aware aggregation

from bedrock_ops import aggregate_stream_usage

events = list(client.converse_stream(
    modelId="anthropic.claude-sonnet-4-20250514-v1:0",
    messages=[{"role": "user", "content": [{"text": "..."}]}],
))
for event in events:
    if "contentBlockDelta" in event:
        print(event["contentBlockDelta"]["delta"].get("text", ""), end="")

# Sum usage across all metadata events in the stream
usage = aggregate_stream_usage(events)
print(f"\ntotal: {usage.total_tokens} ({usage.cache_hit_rate:.1%} cached)")

Capability lookup

from bedrock_ops import capabilities, precheck_features

cap = capabilities("anthropic.claude-sonnet-4-20250514-v1:0")
cap.max_input_tokens          # 200_000
cap.max_output_tokens         # 64_000
cap.supports_prompt_cache     # True
cap.supports_thinking         # True
cap.available_regions         # ('us-east-1', 'us-east-2', 'us-west-2', ...)

# Cross-region inference profile ids resolve to the bare model:
capabilities("us.anthropic.claude-sonnet-4-20250514-v1:0")  # works

# Validate feature combos before the call (catches boto3#4626 silent ValidationException)
precheck_features(
    "anthropic.claude-sonnet-4-20250514-v1:0",
    use_prompt_cache=True,
    use_thinking=True,
    region="us-east-1",
)

For new model releases:

from bedrock_ops import register_model, ModelCapabilities

register_model(ModelCapabilities(
    model_id="anthropic.claude-X-2026...",
    family="anthropic.claude",
    max_input_tokens=200_000,
    max_output_tokens=128_000,
    supports_vision=True,
    supports_tool_use=True,
    supports_prompt_cache=True,
    supports_thinking=True,
    supports_streaming=True,
    supports_cross_region_inference=True,
    available_regions=("us-east-1", "us-west-2"),
))

Guardrails without PII leaks

from bedrock_ops import (
    safe_log_response, assert_no_guardrail_violation, BedrockGuardrailViolation,
)

resp = client.converse(
    modelId="...",
    messages=[...],
    guardrailConfig={"guardrailIdentifier": "gid-123", "guardrailVersion": "DRAFT"},
)

# Option A: detect without raising
if resp.guardrail and resp.guardrail.action == "BLOCKED":
    log.info("guardrail fired", categories=resp.guardrail.categories)
    # resp.guardrail has NO content — safe to log

# Option B: raise on intervention
try:
    assert_no_guardrail_violation(resp.raw, guardrail_id="gid-123")
except BedrockGuardrailViolation as e:
    # str(e) and repr(e) contain no PII; only category names
    log.warning("blocked: categories=%s", e.categories)

# Always: redact before sending to a structured logger or trace store
logger.info("converse done", extra={"resp": safe_log_response(resp.raw, guardrail_id="gid-123")})

Repairing conversation history after Guardrails redaction

from bedrock_ops import repair_orphan_tool_uses

# After Guardrails has stripped some tool_results from history, calling
# converse() again would fail with a ValidationException because the
# orphaned tool_use blocks have no matching tool_result. Run this first:
clean_messages = repair_orphan_tool_uses(messages)
client.converse(modelId=..., messages=clean_messages, ...)

What it explicitly does NOT do

  • Not an agent framework.
  • Not an LLM router. Bedrock-only by design.
  • Not a vector DB or RAG framework.
  • Not a prompt management UI.
  • Not a tracer or observability platform. Compose with Phoenix / Langfuse / Datadog / OTel as you would with raw boto3.
  • Not async-first in v0.1. Async support via aioboto3 is planned for v0.2.

Versioning

bedrock-ops follows semantic versioning. The capability table is treated as data, not API: new models added in patch releases. Breaking API changes get a major bump.

Contributing

Issues and PRs welcome at https://github.com/MukundaKatta/bedrock-ops. The roadmap below indicates what's coming next; if you need something else, open an issue first to discuss scope.

Roadmap

  • v0.2: async-first via aioboto3; normalized streaming event taxonomy (text_delta / tool_use_delta / thinking_delta / tool_result).
  • v0.3: per-call cost computation with versioned price tables.
  • v0.4: pre-inference token counting via the Bedrock CountTokens API.
  • v0.5: bedrock-agent-runtime support (invoke_agent + retrieve).

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bedrock_ops-0.1.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bedrock_ops-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file bedrock_ops-0.1.0.tar.gz.

File metadata

  • Download URL: bedrock_ops-0.1.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for bedrock_ops-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0d3ee106ddc6335fd72faa3c0543173aa4f16d1d0c34606df73a5d3dfceb3046
MD5 3daea7d4e0841e8ef2d322fdc790c196
BLAKE2b-256 272178c1fde7432cc2813cbb959a4c2b70389f08b871ee752cb66bb7dfea0c81

See more details on using hashes here.

File details

Details for the file bedrock_ops-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bedrock_ops-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for bedrock_ops-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9005132e81b3e9be29ca2f748178822aee752cecdbd61f839e2f91356f46666d
MD5 80e388f25c19a4a607526e05470db7ae
BLAKE2b-256 6184ab01348211354fafd2ee74d6e0d7dcafabd61849e351ac6d4ff91d985d3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page