A client for interacting with LLM completion APIs and tracking usage.

These details have not been verified by PyPI

Project links

homepage

Project description

`llm-api-client` :robot::zap:

Tests status PyPI status PyPI version PyPI - License Python compatibility

A Python helper library for efficiently managing concurrent, rate-limited API requests to LLM providers via LiteLLM.

It provides an APIClient that handles:

Concurrency: Making multiple API calls simultaneously using threads.
Rate Limiting: Respecting API limits for requests per minute (RPM) and tokens per minute (TPM).
Retries: Automatically retrying failed requests.
Request Sanitization: Cleaning up request parameters to ensure compatibility with different models/providers.
LLM Context Management: Truncating message history to fit within model context windows.
Usage Tracking: Monitoring API costs, token counts, and response times via an integrated APIUsageTracker.

Code documentation available at https://andrefcruz.github.io/llm-api-client/

Installation

Install the package directly from PyPI:

pip install llm-api-client

Usage

The primary way to interact with the APIClient is through its make_requests and make_requests_with_retries methods, which handle concurrent execution, rate limiting, and retrying failed requests.

Here's a basic example of using APIClient to make multiple completion requests concurrently:

import os
from llm_api_client import APIClient

# Ensure your API key is set (e.g., OPENAI_API_KEY environment variable)
# os.environ["OPENAI_API_KEY"] = "your-api-key"

# Create a client with specific rate limits (adjust as needed)
# Defaults use OpenAI Tier 4 limits if not specified.
client = APIClient(
    max_requests_per_minute=1000,
    max_tokens_per_minute=100000
)

# Prepare your API requests
prompts = [
    "Explain the theory of relativity in simple terms.",
    "Write a short poem about a cat.",
    "What is the capital of France?",
]

requests_data = [
    {
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": prompt}],
        # Add other parameters like temperature, max_tokens etc. if needed
        # "temperature": 0.7,
        # "max_tokens": 150,
    }
    for prompt in prompts
]

# Make the requests concurrently
# Use make_requests_with_retries for built-in retry logic
responses = client.make_requests(requests_data)

# Process the responses
for i, response in enumerate(responses):
    if response:
        # Access response content (structure depends on the API/model)
        # For OpenAI/LiteLLM completion:
        try:
            message_content = response.choices[0].message.content
            print(f"Response {i+1}: {message_content[:100]}...") # Print first 100 chars
        except (AttributeError, IndexError, TypeError) as e:
            print(f"Response {i+1}: Could not parse response content. Error: {e}")
            print(f"Raw response: {response}")
    else:
        print(f"Response {i+1}: Request failed.")

# Access usage statistics
print("\n--- Usage Statistics ---")
print(client.tracker) # Prints detailed stats

# Or access specific stats
print(f"Total cost: ${client.tracker.total_cost:.4f}")
print(f"Total prompt tokens: {client.tracker.total_prompt_tokens}")
print(f"Total completion tokens: {client.tracker.total_completion_tokens}")
print(f"Number of successful API calls: {client.tracker.num_api_calls}")
print(f"Mean response time: {client.tracker.mean_response_time:.2f}s")

# View request/response history
# print("\n--- History ---")
# for entry in client.history:
#     print(entry)

Method Parameters

Both make_requests and make_requests_with_retries accept the following core parameters:

requests (list[dict]): A list where each dictionary represents the parameters for a single API call (e.g., model, messages, temperature, etc.) -- follows the openai API standard via litellm.
max_workers (int, optional): The maximum number of concurrent threads to use for making API calls. Defaults to min(CPU count * 20, max_rpm).
sanitize (bool, optional): If True (default), the client will attempt to remove parameters that are incompatible with the specified model and provider before making the request. It also truncates message history to fit the model's context window.
timeout (float, optional): The maximum number of seconds to wait for all requests to complete. If None (default), it waits indefinitely.

The make_requests_with_retries method includes one additional parameter:

max_retries (int, optional): The maximum number of times to retry a failed request. Defaults to 2.

Project details

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

0.1.6

Nov 25, 2025

0.1.5

Nov 24, 2025

0.1.4

Aug 21, 2025

0.1.3

Aug 19, 2025

This version

0.1.2

Aug 15, 2025

0.1.1

Apr 22, 2025

0.1.0

Apr 21, 2025

0.0.1

Apr 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_api_client-0.1.2.tar.gz (20.3 kB view details)

Uploaded Aug 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_api_client-0.1.2-py3-none-any.whl (14.5 kB view details)

Uploaded Aug 15, 2025 Python 3

File details

Details for the file llm_api_client-0.1.2.tar.gz.

File metadata

Download URL: llm_api_client-0.1.2.tar.gz
Upload date: Aug 15, 2025
Size: 20.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_api_client-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3361dadcc0f3f12a780dce970d12ac0376bbe3e30d7bc1fc3b95040745efc7ca`
MD5	`bf6ca1334164ed89c97aeb78f6dc4b84`
BLAKE2b-256	`643261d97e2a526db8d72611fadfbd990535eb7a203582e2f750f4986a4ef838`

See more details on using hashes here.

File details

Details for the file llm_api_client-0.1.2-py3-none-any.whl.

File metadata

Download URL: llm_api_client-0.1.2-py3-none-any.whl
Upload date: Aug 15, 2025
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_api_client-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`711654351cf47136ab5b7cf8c61a38b776e1b013ce98379cf38f76559dc57594`
MD5	`ee7e1b491a6a880025fb3c2783461463`
BLAKE2b-256	`c852acd2522fda8ad1c47f3890f2188c1423ef135e16cb8138ad0dd6564f6461`

See more details on using hashes here.

llm-api-client 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`llm-api-client` :robot::zap:

Installation

Usage

Method Parameters

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

llm-api-client 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-api-client :robot::zap:

Installation

Usage

Method Parameters

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`llm-api-client` :robot::zap: