Skip to main content

Exponential backoff with full jitter for LLM API calls. Sync + async. Built-in retryable-code presets for Anthropic, OpenAI, Bedrock, Gemini. Zero runtime deps.

Project description

llm-retry-py

PyPI Python License: MIT

Exponential backoff with full jitter for LLM API calls.

Most retry libraries are framework middleware or async-only. This one is a small function you wrap around any callable. Sync and async loops share the same RetryPolicy. Built-in retryable-code sets for Anthropic, OpenAI, AWS Bedrock, and Google Gemini ship in llm_retry.predicates.

Sibling to the Rust crate llm-retry.

Install

pip install llm-retry-py

Use

Sync:

from llm_retry import retry, RetryPolicy, predicates

def call_anthropic():
    return client.messages.create(model="claude-sonnet-4-5", ...)

resp = retry(
    call_anthropic,
    policy=RetryPolicy(),
    should_retry=lambda e: predicates.is_anthropic_retryable(str(e)),
)

Async (works with any asyncio code, no runtime lock-in beyond asyncio.sleep):

from llm_retry import retry_async, RetryPolicy, predicates

async def call_openai():
    return await client.chat.completions.create(...)

resp = await retry_async(
    call_openai,
    policy=RetryPolicy(),
    should_retry=lambda e: predicates.is_openai_retryable(str(e)),
)

Custom predicate (BYO matcher):

import httpx

def should_retry(e: Exception) -> bool:
    if isinstance(e, httpx.TimeoutException):
        return True
    if isinstance(e, httpx.HTTPStatusError):
        return predicates.is_http_status_retryable(e.response.status_code)
    return False

resp = retry(call, policy=RetryPolicy(), should_retry=should_retry)

RetryPolicy

Defaults: 6 attempts, 500ms base delay, 30s cap, full jitter.

from llm_retry import RetryPolicy, Jitter

policy = RetryPolicy(
    max_attempts=8,
    base_delay_ms=250,
    max_delay_ms=60_000,
    jitter=Jitter.FULL,  # FULL (default, AWS-recommended), EQUAL, or NONE
)

Backoff for attempt i (0-indexed) is min(base * 2**i, max), then jittered.

Jitter Resulting delay
NONE capped
EQUAL capped/2 + uniform(0, capped/2)
FULL uniform(0, capped) (recommended)

Presets

from llm_retry import predicates

predicates.is_anthropic_retryable("rate_limit_error")     # True
predicates.is_openai_retryable("server_error")            # True
predicates.is_bedrock_retryable("ThrottlingException")    # True
predicates.is_gemini_retryable("RESOURCE_EXHAUSTED")      # True
predicates.is_http_status_retryable(503)                  # True

The underlying lists are public constants you can extend:

from llm_retry.predicates import ANTHROPIC_RETRYABLE, contains_any

my_codes = ANTHROPIC_RETRYABLE + ("my_custom_transient_code",)
should_retry = lambda e: contains_any(str(e), my_codes)

Error type

A retry that does not succeed raises RetryExhausted:

from llm_retry import RetryExhausted

try:
    resp = retry(call, policy=RetryPolicy(), should_retry=lambda e: True)
except RetryExhausted as exc:
    exc.attempts        # how many tries ran
    exc.last_error      # the final exception raised by `call`

If the predicate returns False for the first error, that error is re-raised unchanged (no wrapping).

What it does NOT do

  • No HTTP client. Wrap any callable that raises.
  • No circuit breaker. Layer one on top if you want.
  • No deadline (stop after N seconds total). Combine with your own asyncio.wait_for or signal.alarm.
  • No structured logging hooks. Add them in your callable.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_retry_py-0.1.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_retry_py-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_retry_py-0.1.0.tar.gz.

File metadata

  • Download URL: llm_retry_py-0.1.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for llm_retry_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 54f2549812a73ba28b3248f775399721d02e5c01fb7649bbea64121fcc1ba7ca
MD5 4a597a83f87da8fc211b4d1d45549dc0
BLAKE2b-256 e8d080ddd01a6e4946612470c03ebf08584ae8529a74e9ca3dfbebec4daa6593

See more details on using hashes here.

File details

Details for the file llm_retry_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_retry_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for llm_retry_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20859fdbce5ad9057cc4bca56fe828cdb8650d4710a932ed86d81985c718fa5a
MD5 4be21d81a74830a235610ab1d1ddc4d0
BLAKE2b-256 dae88ca34f9ef423861818a33c29e49570bbd7363804313cf4983f66937dac44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page