Keep callm and process thousands of requests without (rate) limits.
Project description
callm
Keep callm and process thousands of requests without (rate) limits
Installation • Quick Start • Providers • Examples • Contributing
😌 Why callm?
Building LLM-powered applications often means processing thousands of API requests. You've probably experienced:
| Problem | Without callm | With callm |
|---|---|---|
| Rate limit errors | Constant 429 errors, manual sleep/retry | Automatic RPM & TPM throttling |
| Retry logic | Write custom backoff for each project | Built-in exponential backoff with jitter |
| Token tracking | No visibility into usage | Real-time token consumption metrics |
| Boilerplate code | Copy-paste the same async code everywhere | One function call, any provider |
| Waiting for batch APIs | Provider batch APIs take up to 24 hours | Results in minutes, not hours |
| Multiple SDKs | Install openai, anthropic, cohere, ... | One library, all providers |
Stop rewriting the same parallel processing code. callm handles the infrastructure so you can focus on your application.
Testing multiple providers? Just swap the provider class—no new dependencies, no code changes. Find what works best for your use case.
Installation
pip install callm-py
From source:
git clone https://github.com/milistu/callm.git
cd callm
pip install -e .
Quick Start
Process 1,000 product descriptions to extract structured data—in under a minute:
import asyncio
from callm import process_requests, RateLimitConfig
from callm.providers import OpenAIProvider
# Configure your provider
provider = OpenAIProvider(
api_key="sk-...",
model="gpt-5-mini",
request_url="https://api.openai.com/v1/responses",
)
# Your data processing requests
products = [
{"id": 1, "description": "Nike Air Max 90 - Classic sneakers in white/black, size 10"},
{"id": 2, "description": "Sony WH-1000XM5 Wireless Headphones - Noise cancelling, 30hr battery"},
# ... thousands more
]
requests = [
{
"input": f"Extract brand, category, and key features from: {p['description']}",
"metadata": {"product_id": p["id"]},
}
for p in products
]
async def main():
results = await process_requests(
provider=provider,
requests=requests,
rate_limit=RateLimitConfig(
max_requests_per_minute=5_000, # Stay under your tier limit
max_tokens_per_minute=2_000_000,
),
)
print(f"Processed {results.stats.successful} requests in {results.stats.duration_seconds:.1f}s")
print(f"Tokens used: {results.stats.total_input_tokens + results.stats.total_output_tokens:,}")
# Access results
for result in results.successes:
print(f"Product {result.metadata['product_id']}: {result.response}")
asyncio.run(main())
Features
- Precise Rate Limiting — Token buckets for RPM and TPM, respects provider limits
- Smart Retries — Exponential backoff with jitter, automatic 429/5xx handling
- Usage Tracking — Metrics for input tokens and output tokens
- Flexible I/O — Process from Python lists or JSONL files, output to memory or disk
- Structured Outputs — Support for Pydantic models and JSON schemas
- Provider Agnostic — Same API across OpenAI, Anthropic, Gemini, DeepSeek, and more
Supported Providers
|
OpenAI Chat, Responses, Embeddings |
Anthropic Messages API |
Gemini Generate, Embeddings |
|
DeepSeek Chat Completions |
Cohere Embed API |
Voyage AI Embeddings |
Examples
Explore real-world use cases in the examples/ directory:
| Use Case | Description |
|---|---|
| Data Extraction | Extract structured data from product listings, invoices |
| Embeddings | Generate embeddings for RAG and semantic search |
| Evaluation | Multi-judge consensus evaluation |
| Synthetic Data | Generate training data and evaluation sets |
| Classification | Sentiment analysis, content moderation |
| Translation | Dataset translation for multilingual evaluation |
Processing Modes
callm supports four processing modes depending on your input source and output destination:
| Input | Output | Best For |
|---|---|---|
| Python list | In-memory | Small batches, interactive use |
| Python list | JSONL file | Medium batches, need persistence |
| JSONL file | JSONL file | Large batches, low memory |
| JSONL file | In-memory | Loading saved requests, testing |
# 1. List → Memory (small batches)
results = await process_requests(
provider=provider,
requests=my_list,
rate_limit=rate_limit,
)
# Access: results.successes, results.failures
# 2. List → File (persist results)
results = await process_requests(
provider=provider,
requests=my_list,
rate_limit=rate_limit,
output_path="results.jsonl",
)
# 3. File → File (large batches, low memory)
results = await process_requests(
provider=provider,
requests="input.jsonl",
rate_limit=rate_limit,
output_path="results.jsonl",
)
# 4. File → Memory (reload saved requests)
results = await process_requests(
provider=provider,
requests="input.jsonl",
rate_limit=rate_limit,
)
Configuration
from callm import RateLimitConfig, RetryConfig
# Rate limiting (required)
rate_limit = RateLimitConfig(
max_requests_per_minute=1000,
max_tokens_per_minute=100_000,
)
# Retry behavior (optional, sensible defaults)
retry = RetryConfig(
max_attempts=5,
base_delay_seconds=0.5,
max_delay_seconds=15.0,
jitter=0.1,
)
results = await process_requests(
provider=provider,
requests=requests,
rate_limit=rate_limit,
retry=retry,
)
API Reference
process_requests()
Main function for parallel API request processing.
| Parameter | Type | Description |
|---|---|---|
provider |
BaseProvider |
Provider instance (OpenAI, Anthropic, etc.) |
requests |
list[dict] | str |
List of request dicts or path to JSONL file |
rate_limit |
RateLimitConfig |
RPM and TPM limits |
retry |
RetryConfig |
Optional retry configuration |
output_path |
str |
Optional path for output JSONL (enables streaming) |
errors_path |
str |
Optional path for error JSONL |
logging_level |
int |
Logging verbosity (default: 20/INFO) |
Returns: ProcessingResults with successes, failures, and stats.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
# Setup development environment
git clone https://github.com/milistu/callm.git
cd callm
uv sync --dev
uv run pre-commit install
# Run tests
uv run nox
License
MIT License - see LICENSE for details.
Built with 🧡 for engineers who process data at scale
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file callm_py-0.1.0.tar.gz.
File metadata
- Download URL: callm_py-0.1.0.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d37860476633c6b11c71565840d3b3b338016214241b537fb83bd2211b8b561a
|
|
| MD5 |
207d4a1d8b0183e33208b32c4599eedd
|
|
| BLAKE2b-256 |
c50179ffa31c10a9a648502d109dc21c9b829f2fd8964eaeba46c5f20cc3dc59
|
Provenance
The following attestation bundles were made for callm_py-0.1.0.tar.gz:
Publisher:
publish.yml on milistu/callm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
callm_py-0.1.0.tar.gz -
Subject digest:
d37860476633c6b11c71565840d3b3b338016214241b537fb83bd2211b8b561a - Sigstore transparency entry: 768983435
- Sigstore integration time:
-
Permalink:
milistu/callm@919ccb37cf1f5708d179e0a2945e9ebea7d7debe -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/milistu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@919ccb37cf1f5708d179e0a2945e9ebea7d7debe -
Trigger Event:
push
-
Statement type:
File details
Details for the file callm_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: callm_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b372a853a16f27c850f0c6cb02dffecc5dcf5d2596c5b20e8d7833078577af34
|
|
| MD5 |
02c99ba4bf15253f6471ea32d4e5c3fa
|
|
| BLAKE2b-256 |
5bcdf2ba6c78dc317b650c6473f09002cd245eeaccf36edf48ad8f99eabf0181
|
Provenance
The following attestation bundles were made for callm_py-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on milistu/callm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
callm_py-0.1.0-py3-none-any.whl -
Subject digest:
b372a853a16f27c850f0c6cb02dffecc5dcf5d2596c5b20e8d7833078577af34 - Sigstore transparency entry: 768983439
- Sigstore integration time:
-
Permalink:
milistu/callm@919ccb37cf1f5708d179e0a2945e9ebea7d7debe -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/milistu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@919ccb37cf1f5708d179e0a2945e9ebea7d7debe -
Trigger Event:
push
-
Statement type: