High-performance parallel LLM API request tool with caching and multiple provider support

These details have not been verified by PyPI

Project description

FastLLM

High-performance parallel LLM API request tool with support for multiple providers and caching capabilities.

Features

Parallel request processing with configurable concurrency
- Allows you to process 20000+ prompt tokens per second and 1500+ output tokens per second even for extremely large LLMs, such as Deepseek-V3.
Built-in caching support (in-memory and disk-based)
Progress tracking with token usage statistics
Support for multiple LLM providers (OpenAI, OpenRouter, etc.)
OpenAI-style API for request batching
Retry mechanism with configurable attempts and delays
Request deduplication and response ordering

Installation

Use pip:

pip install fastllm-kit

Alternatively, use uv:

uv pip install fastllm-kit

Important: fastllm does not support yet libsqlite3.49.1, please use libsqlite3.49.0 or lower. See this issue for more details. This might be an issue for users with conda environments.

For development:

# Clone the repository
git clone https://github.com/Rexhaif/fastllm.git
cd fastllm

# Create a virtual environment and install dependencies
uv venv
uv pip install -e ".[dev]"

Dependencies

FastLLM requires Python 3.9 or later and depends on the following packages:

httpx (^0.27.2) - For async HTTP requests
pydantic (^2.10.6) - For data validation and settings management
rich (^13.9.4) - For beautiful terminal output and progress bars
diskcache (^5.6.3) - For persistent disk caching
asyncio (^3.4.3) - For asynchronous operations
anyio (^4.8.0) - For async I/O operations
tqdm (^4.67.1) - For progress tracking
typing_extensions (^4.12.2) - For enhanced type hints

Development dependencies:

ruff (^0.3.7) - For linting and formatting
pytest (^8.3.4) - For testing
pytest-asyncio (^0.23.8) - For async tests
pytest-cov (^4.1.0) - For test coverage
black (^24.10.0) - For code formatting
coverage (^7.6.10) - For code coverage reporting

Development

The project uses just for task automation and uv for dependency management.

Common tasks:

# Install dependencies
just install

# Run tests
just test

# Format code
just format

# Run linting
just lint

# Clean up cache files
just clean

Quick Start

from fastllm import RequestBatch, RequestManager, OpenAIProvider, InMemoryCache

# Create a provider
provider = OpenAIProvider(
    api_key="your-api-key",
    # Optional: custom API base URL
    api_base="https://api.openai.com/v1",
)

# Create a cache provider (optional)
cache = InMemoryCache()  # or DiskCache(directory="./cache")

# Create a request manager
manager = RequestManager(
    provider=provider,
    concurrency=50,  # Number of concurrent requests
    show_progress=True,  # Show progress bar
    caching_provider=cache,  # Enable caching
)

# Create a batch of requests
request_ids = []  # Store request IDs for later use
with RequestBatch() as batch:
    # Add requests to the batch
    for i in range(10):
        # create() returns the request ID (caching key)
        request_id = batch.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{
                "role": "user",
                "content": f"What is {i} + {i}?"
            }],
            temperature=0.7,
            include_reasoning=True,  # Optional: include model reasoning
        )
        request_ids.append(request_id)

# Process the batch
responses = manager.process_batch(batch)

# Process responses
for request_id, response in zip(request_ids, responses):
    print(f"Request {request_id}: {response.response.choices[0].message.content}")
        
# You can use request IDs to check cache status
for request_id in request_ids:
    is_cached = await cache.exists(request_id)
    print(f"Request {request_id} is {'cached' if is_cached else 'not cached'}")

Advanced Usage

Async Support

FastLLM can be used both synchronously and asynchronously, and works seamlessly in regular Python environments, async applications, and Jupyter notebooks:

import asyncio
from fastllm import RequestBatch, RequestManager, OpenAIProvider

# Works in Jupyter notebooks
provider = OpenAIProvider(api_key="your-api-key")
manager = RequestManager(provider=provider)
responses = manager.process_batch(batch)  # Just works!

# Works in async applications
async def process_requests():
    provider = OpenAIProvider(api_key="your-api-key")
    manager = RequestManager(provider=provider)
    
    with RequestBatch() as batch:
        batch.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello!"}]
        )
    
    responses = manager.process_batch(batch)
    return responses

# Run in existing event loop
async def main():
    responses = await process_requests()
    print(responses)

asyncio.run(main())

Caching Configuration

FastLLM supports both in-memory and disk-based caching, with request IDs serving as cache keys:

from fastllm import InMemoryCache, DiskCache, RequestBatch

# Create a batch and get request IDs
with RequestBatch() as batch:
    request_id = batch.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(f"Request ID (cache key): {request_id}")

# In-memory cache (faster, but cleared when process ends)
cache = InMemoryCache()

# Disk cache (persistent, with optional TTL and size limits)
cache = DiskCache(
    directory="./cache",
    ttl=3600,  # Cache TTL in seconds
    size_limit=int(2e9)  # 2GB size limit
)

# Check if a response is cached
is_cached = await cache.exists(request_id)

# Get cached response if available
if is_cached:
    response = await cache.get(request_id)

Custom Providers

Create your own provider by inheriting from the base Provider class:

from fastllm import Provider
from typing import Any
import httpx

class CustomProvider(Provider[YourResponseType]):
    def get_request_headers(self) -> dict[str, str]:
        return {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

    async def make_request(
        self,
        client: httpx.AsyncClient,
        request: dict[str, Any],
        timeout: float,
    ) -> YourResponseType:
        # Implement your request logic here
        pass

Progress Tracking

The progress bar shows:

Request completion progress
Tokens per second (prompt and completion)
Cache hit/miss statistics
Estimated time remaining
Total elapsed time

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.9

May 12, 2025

This version

0.1.8

Apr 14, 2025

0.1.7

Apr 8, 2025

0.1.3

Apr 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastllm_kit-0.1.8.tar.gz (48.1 kB view details)

Uploaded Apr 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastllm_kit-0.1.8-py3-none-any.whl (15.2 kB view details)

Uploaded Apr 14, 2025 Python 3

File details

Details for the file fastllm_kit-0.1.8.tar.gz.

File metadata

Download URL: fastllm_kit-0.1.8.tar.gz
Upload date: Apr 14, 2025
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.2

File hashes

Hashes for fastllm_kit-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`3daaf708f83e31d6b6ed8860a93282742d6862866bfc3e80658a024e2ae591f7`
MD5	`c06df7945323ced0bc845bca3088bfae`
BLAKE2b-256	`d000094b692132fd8bbdf5ea8d609d492713fb2c23d8e1700a46102b3391ca42`

See more details on using hashes here.

File details

Details for the file fastllm_kit-0.1.8-py3-none-any.whl.

File metadata

Download URL: fastllm_kit-0.1.8-py3-none-any.whl
Upload date: Apr 14, 2025
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.2

File hashes

Hashes for fastllm_kit-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6555287bec7e897c1e38e5bf73efe169ec7bba37beed803bb9e221108f0653e`
MD5	`39753db4a98d3dfcccf6eeb69e3d3b6c`
BLAKE2b-256	`738719b2903dc02f2b8b1f97b9d0a34734046dd96a6728b13718749e95414c61`

See more details on using hashes here.

fastllm-kit 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

FastLLM

Features

Installation

Dependencies

Development

Quick Start

Advanced Usage

Async Support

Caching Configuration

Custom Providers

Progress Tracking

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes