Skip to main content

Keep callm and process thousands of requests without (rate) limits.

Project description

callm

Keep callm and process thousands of requests without (rate) limits

PyPI version Python versions License

InstallationQuick StartProvidersExamplesContributing


😌 Why callm?

Building LLM-powered applications often means processing thousands of API requests. You've probably experienced:

Problem Without callm With callm
Rate limit errors Constant 429 errors, manual sleep/retry Automatic RPM & TPM throttling
Retry logic Write custom backoff for each project Built-in exponential backoff with jitter
Token tracking No visibility into usage Real-time token consumption metrics
Boilerplate code Copy-paste the same async code everywhere One function call, any provider
Waiting for batch APIs Provider batch APIs take up to 24 hours Results in minutes, not hours
Multiple SDKs Install openai, anthropic, cohere, ... One library, all providers

Stop rewriting the same parallel processing code. callm handles the infrastructure so you can focus on your application.

Testing multiple providers? Just swap the provider class—no new dependencies, no code changes. Find what works best for your use case.

Installation

pip install callm-py

From source:

git clone https://github.com/milistu/callm.git
cd callm
pip install -e .

Quick Start

Process 1,000 product descriptions to extract structured data—in under a minute:

import asyncio
from callm import process_requests, RateLimitConfig
from callm.providers import OpenAIProvider

# Configure your provider
provider = OpenAIProvider(
    api_key="sk-...",
    model="gpt-5-mini",
    request_url="https://api.openai.com/v1/responses",
)

# Your data processing requests
products = [
    {"id": 1, "description": "Nike Air Max 90 - Classic sneakers in white/black, size 10"},
    {"id": 2, "description": "Sony WH-1000XM5 Wireless Headphones - Noise cancelling, 30hr battery"},
    # ... thousands more
]

requests = [
    {
        "input": f"Extract brand, category, and key features from: {p['description']}",
        "metadata": {"product_id": p["id"]},
    }
    for p in products
]

async def main():
    results = await process_requests(
        provider=provider,
        requests=requests,
        rate_limit=RateLimitConfig(
            max_requests_per_minute=5_000,    # Stay under your tier limit
            max_tokens_per_minute=2_000_000,
        ),
    )

    print(f"Processed {results.stats.successful} requests in {results.stats.duration_seconds:.1f}s")
    print(f"Tokens used: {results.stats.total_input_tokens + results.stats.total_output_tokens:,}")

    # Access results
    for result in results.successes:
        print(f"Product {result.metadata['product_id']}: {result.response}")

asyncio.run(main())

Features

  • Precise Rate Limiting — Token buckets for RPM and TPM, respects provider limits
  • Smart Retries — Exponential backoff with jitter, automatic 429/5xx handling
  • Usage Tracking — Metrics for input tokens and output tokens
  • Flexible I/O — Process from Python lists or JSONL files, output to memory or disk
  • Structured Outputs — Support for Pydantic models and JSON schemas
  • Provider Agnostic — Same API across OpenAI, Anthropic, Gemini, DeepSeek, and more

Supported Providers

OpenAI
OpenAI
Chat, Responses, Embeddings
Anthropic
Anthropic
Messages API
Google Gemini
Gemini
Generate, Embeddings
DeepSeek
DeepSeek
Chat Completions
Cohere
Cohere
Embed API
Voyage AI
Voyage AI
Embeddings

Examples

Explore real-world use cases in the examples/ directory:

Use Case Description
Data Extraction Extract structured data from product listings, invoices
Embeddings Generate embeddings for RAG and semantic search
Evaluation Multi-judge consensus evaluation
Synthetic Data Generate training data and evaluation sets
Classification Sentiment analysis, content moderation
Translation Dataset translation for multilingual evaluation

Processing Modes

callm supports four processing modes depending on your input source and output destination:

Input Output Best For
Python list In-memory Small batches, interactive use
Python list JSONL file Medium batches, need persistence
JSONL file JSONL file Large batches, low memory
JSONL file In-memory Loading saved requests, testing
# 1. List → Memory (small batches)
results = await process_requests(
    provider=provider,
    requests=my_list,
    rate_limit=rate_limit,
)
# Access: results.successes, results.failures

# 2. List → File (persist results)
results = await process_requests(
    provider=provider,
    requests=my_list,
    rate_limit=rate_limit,
    output_path="results.jsonl",
)

# 3. File → File (large batches, low memory)
results = await process_requests(
    provider=provider,
    requests="input.jsonl",
    rate_limit=rate_limit,
    output_path="results.jsonl",
)

# 4. File → Memory (reload saved requests)
results = await process_requests(
    provider=provider,
    requests="input.jsonl",
    rate_limit=rate_limit,
)

Configuration

from callm import RateLimitConfig, RetryConfig

# Rate limiting (required)
rate_limit = RateLimitConfig(
    max_requests_per_minute=1000,
    max_tokens_per_minute=100_000,
)

# Retry behavior (optional, sensible defaults)
retry = RetryConfig(
    max_attempts=5,
    base_delay_seconds=0.5,
    max_delay_seconds=15.0,
    jitter=0.1,
)

results = await process_requests(
    provider=provider,
    requests=requests,
    rate_limit=rate_limit,
    retry=retry,
)

API Reference

process_requests()

Main function for parallel API request processing.

Parameter Type Description
provider BaseProvider Provider instance (OpenAI, Anthropic, etc.)
requests list[dict] | str List of request dicts or path to JSONL file
rate_limit RateLimitConfig RPM and TPM limits
retry RetryConfig Optional retry configuration
output_path str Optional path for output JSONL (enables streaming)
errors_path str Optional path for error JSONL
logging_level int Logging verbosity (default: 20/INFO)

Returns: ProcessingResults with successes, failures, and stats.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

# Setup development environment
git clone https://github.com/milistu/callm.git
cd callm
uv sync --dev
uv run pre-commit install

# Run tests
uv run nox

License

MIT License - see LICENSE for details.


Built with 🧡 for engineers who process data at scale

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

callm_py-0.1.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

callm_py-0.1.0-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file callm_py-0.1.0.tar.gz.

File metadata

  • Download URL: callm_py-0.1.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for callm_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d37860476633c6b11c71565840d3b3b338016214241b537fb83bd2211b8b561a
MD5 207d4a1d8b0183e33208b32c4599eedd
BLAKE2b-256 c50179ffa31c10a9a648502d109dc21c9b829f2fd8964eaeba46c5f20cc3dc59

See more details on using hashes here.

Provenance

The following attestation bundles were made for callm_py-0.1.0.tar.gz:

Publisher: publish.yml on milistu/callm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file callm_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: callm_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for callm_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b372a853a16f27c850f0c6cb02dffecc5dcf5d2596c5b20e8d7833078577af34
MD5 02c99ba4bf15253f6471ea32d4e5c3fa
BLAKE2b-256 5bcdf2ba6c78dc317b650c6473f09002cd245eeaccf36edf48ad8f99eabf0181

See more details on using hashes here.

Provenance

The following attestation bundles were made for callm_py-0.1.0-py3-none-any.whl:

Publisher: publish.yml on milistu/callm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page