Skip to main content

High-performance parallel LLM API request tool with caching and multiple provider support

Project description

FastLLM

High-performance parallel LLM API request tool with support for multiple providers and caching capabilities.

Features

  • Parallel request processing with configurable concurrency
    • Allows you to process 20000+ prompt tokens per second and 1500+ output tokens per second even for extremely large LLMs, such as Deepseek-V3.
  • Built-in caching support (in-memory and disk-based)
  • Progress tracking with token usage statistics
  • Support for multiple LLM providers (OpenAI, OpenRouter, etc.)
  • OpenAI-style API for request batching
  • Retry mechanism with configurable attempts and delays
  • Request deduplication and response ordering

Installation

Use pip:

pip install fastllm-kit

Alternatively, use uv:

uv pip install fastllm-kit

Important: fastllm does not support yet libsqlite3.49.1, please use libsqlite3.49.0 or lower. See this issue for more details. This might be an issue for users with conda environments.

For development:

# Clone the repository
git clone https://github.com/Rexhaif/fastllm.git
cd fastllm

# Create a virtual environment and install dependencies
uv venv
uv pip install -e ".[dev]"

Dependencies

FastLLM requires Python 3.9 or later and depends on the following packages:

  • httpx (^0.27.2) - For async HTTP requests
  • pydantic (^2.10.6) - For data validation and settings management
  • rich (^13.9.4) - For beautiful terminal output and progress bars
  • diskcache (^5.6.3) - For persistent disk caching
  • asyncio (^3.4.3) - For asynchronous operations
  • anyio (^4.8.0) - For async I/O operations
  • tqdm (^4.67.1) - For progress tracking
  • typing_extensions (^4.12.2) - For enhanced type hints

Development dependencies:

  • ruff (^0.3.7) - For linting and formatting
  • pytest (^8.3.4) - For testing
  • pytest-asyncio (^0.23.8) - For async tests
  • pytest-cov (^4.1.0) - For test coverage
  • black (^24.10.0) - For code formatting
  • coverage (^7.6.10) - For code coverage reporting

Development

The project uses just for task automation and uv for dependency management.

Common tasks:

# Install dependencies
just install

# Run tests
just test

# Format code
just format

# Run linting
just lint

# Clean up cache files
just clean

Quick Start

from fastllm import RequestBatch, RequestManager, OpenAIProvider, InMemoryCache

# Create a provider
provider = OpenAIProvider(
    api_key="your-api-key",
    # Optional: custom API base URL
    api_base="https://api.openai.com/v1",
)

# Create a cache provider (optional)
cache = InMemoryCache()  # or DiskCache(directory="./cache")

# Create a request manager
manager = RequestManager(
    provider=provider,
    concurrency=50,  # Number of concurrent requests
    show_progress=True,  # Show progress bar
    caching_provider=cache,  # Enable caching
)

# Create a batch of requests
request_ids = []  # Store request IDs for later use
with RequestBatch() as batch:
    # Add requests to the batch
    for i in range(10):
        # create() returns the request ID (caching key)
        request_id = batch.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{
                "role": "user",
                "content": f"What is {i} + {i}?"
            }],
            temperature=0.7,
            include_reasoning=True,  # Optional: include model reasoning
        )
        request_ids.append(request_id)

# Process the batch
responses = manager.process_batch(batch)

# Process responses
for request_id, response in zip(request_ids, responses):
    print(f"Request {request_id}: {response.response.choices[0].message.content}")
        
# You can use request IDs to check cache status
for request_id in request_ids:
    is_cached = await cache.exists(request_id)
    print(f"Request {request_id} is {'cached' if is_cached else 'not cached'}")

Advanced Usage

Async Support

FastLLM can be used both synchronously and asynchronously, and works seamlessly in regular Python environments, async applications, and Jupyter notebooks:

import asyncio
from fastllm import RequestBatch, RequestManager, OpenAIProvider

# Works in Jupyter notebooks
provider = OpenAIProvider(api_key="your-api-key")
manager = RequestManager(provider=provider)
responses = manager.process_batch(batch)  # Just works!

# Works in async applications
async def process_requests():
    provider = OpenAIProvider(api_key="your-api-key")
    manager = RequestManager(provider=provider)
    
    with RequestBatch() as batch:
        batch.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "Hello!"}]
        )
    
    responses = manager.process_batch(batch)
    return responses

# Run in existing event loop
async def main():
    responses = await process_requests()
    print(responses)

asyncio.run(main())

Caching Configuration

FastLLM supports both in-memory and disk-based caching, with request IDs serving as cache keys:

from fastllm import InMemoryCache, DiskCache, RequestBatch

# Create a batch and get request IDs
with RequestBatch() as batch:
    request_id = batch.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(f"Request ID (cache key): {request_id}")

# In-memory cache (faster, but cleared when process ends)
cache = InMemoryCache()

# Disk cache (persistent, with optional TTL and size limits)
cache = DiskCache(
    directory="./cache",
    ttl=3600,  # Cache TTL in seconds
    size_limit=int(2e9)  # 2GB size limit
)

# Check if a response is cached
is_cached = await cache.exists(request_id)

# Get cached response if available
if is_cached:
    response = await cache.get(request_id)

Custom Providers

Create your own provider by inheriting from the base Provider class:

from fastllm import Provider
from typing import Any
import httpx

class CustomProvider(Provider[YourResponseType]):
    def get_request_headers(self) -> dict[str, str]:
        return {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

    async def make_request(
        self,
        client: httpx.AsyncClient,
        request: dict[str, Any],
        timeout: float,
    ) -> YourResponseType:
        # Implement your request logic here
        pass

Progress Tracking

The progress bar shows:

  • Request completion progress
  • Tokens per second (prompt and completion)
  • Cache hit/miss statistics
  • Estimated time remaining
  • Total elapsed time

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastllm_kit-0.1.7.tar.gz (48.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastllm_kit-0.1.7-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file fastllm_kit-0.1.7.tar.gz.

File metadata

  • Download URL: fastllm_kit-0.1.7.tar.gz
  • Upload date:
  • Size: 48.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.2

File hashes

Hashes for fastllm_kit-0.1.7.tar.gz
Algorithm Hash digest
SHA256 bcc6dd8ba5eaaefc4f070016bdce1c2f1f905268902705908be279e5b4fe16ea
MD5 a429ecc59cf3252b8218a4593a7a3dfb
BLAKE2b-256 b42381a44182ba459f5d8ba81c5ff84e1beb15fdcbed19f5e7b40fda70146d3f

See more details on using hashes here.

File details

Details for the file fastllm_kit-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for fastllm_kit-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ac22ab510ee8f8941fd1d2b74fe5c567085f837eaa34074e83a8a4e6a20302
MD5 199300a8283039c3b3889d69e99a4f33
BLAKE2b-256 308c14b0916f30a7960eda277ef5f7f2eae50026508ad455b0582d159a84dd88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page