Skip to main content

ALEA LLM client abstraction library for Python

Project description

ALEA LLM Client

PyPI version License: MIT Python Versions

This is a simple, two-dependency (httpx, pydantic) LLM client for ~OpenAI APIs like:

  • OpenAI (GPT-4, GPT-5, o-series)
  • Anthropic (Claude 3.5, Claude 4)
  • Google (Vertex AI, Gemini API)
  • xAI (Grok)
  • VLLM

Supported Patterns

It provides the following patterns for all endpoints:

  • complete and complete_async -> str via ModelResponse
  • chat and chat_async -> str via ModelResponse
  • json and json_async -> dict via JSONModelResponse
  • pydantic and pydantic_async -> pydantic models
  • responses and responses_async -> structured output with tool use, grammar constraints, and reasoning modes

Model Registry & Capabilities

Version 0.2.0 introduces a comprehensive model registry with detailed capability tracking for 50+ models:

from alea_llm_client.llms import (
    get_models_with_context_window_gte,
    filter_models,
    compare_models,
    get_model_details
)

# Find models with large context windows
large_context = get_models_with_context_window_gte(1000000)

# Filter by multiple criteria
efficient = filter_models(
    min_context=100000,
    capabilities=["tools", "vision"],
    tiers=["mini", "flash"],  # Can also use ModelTier.MINI, ModelTier.FLASH
    exclude_deprecated=True
)

# Compare specific models
comparison = compare_models(["gpt-5", "claude-sonnet-4-20250514", "gemini-2.5-pro"])

Advanced Features

Grammar Constraints (GPT-5)

from alea_llm_client import OpenAIModel

model = OpenAIModel(model="gpt-5")
response = model.responses(
    input="Answer yes or no: Is 2+2=4?",
    grammar='start: "yes" | "no"',
    grammar_syntax="lark"
)

Thinking Mode (Claude 4+)

from alea_llm_client import AnthropicModel

model = AnthropicModel(model="claude-sonnet-4-20250514")
response = model.chat(
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    thinking={"enabled": True, "budget_tokens": 2000}
)
print(response.thinking)  # Access thinking content

Reasoning Tokens (o-series)

from alea_llm_client import OpenAIModel

model = OpenAIModel(model="o3-mini")
response = model.chat(
    messages=[{"role": "user", "content": "Think through this step by step..."}],
    max_completion_tokens=50000
)
print(f"Used {response.reasoning_tokens} reasoning tokens")

Default Caching

Result caching is enabled by default for all methods.

To disable caching, you can either:

  • set ignore_cache=True for each method call (complete, chat, json, pydantic)
  • set ignore_cache=True as a kwarg at model construction

Cached objects are stored in ~/.alea/cache/{provider}/{endpoint_model_hash}/{call_hash}.json in compressed .json.gz format. You can delete these files to clear the cache.

Authentication

Authentication is handled in the following priority order:

  • an api_key provided at model construction
  • a standard environment variable (e.g., ANTHROPIC_API_KEY or OPENAI_API_KEY)
  • a key stored in ~/.alea/keys/{provider} (e.g., openai, anthropic, gemini, grok)

Streaming

Given the research focus of this library, streaming generation is not supported. However, you can directly access the httpx objects on .client and .async_client to stream responses directly if you prefer.

Installation

pip install alea-llm-client

Examples

Basic JSON Example

from alea_llm_client import VLLMModel

if __name__ == "__main__":
    model = VLLMModel(
        endpoint="http://my.vllm.server:8000",
        model="meta-llama/Meta-Llama-3.1-8B-Instruct"
    )

    messages = [
        {
            "role": "user",
            "content": "Give me a JSON object with keys 'name' and 'age' for a person named Alice who is 30 years old.",
        },
    ]

    print(model.json(messages=messages, system="Respond in JSON.").data)

# Output: {'name': 'Alice', 'age': 30}

Basic Completion Example with KL3M

from alea_llm_client import VLLMModel

if __name__ == "__main__":
    model = VLLMModel(
        model="kl3m-1.7b", ignore_cache=True
    )

    prompt = "My name is "
    print(model.complete(prompt=prompt, temperature=0.5).text)

# Output: Dr. Hermann Kamenzi, and

Pydantic Example

from pydantic import BaseModel
from alea_llm_client import AnthropicModel, format_prompt, format_instructions

class Person(BaseModel):
    name: str
    age: int

if __name__ == "__main__":
    model = AnthropicModel(ignore_cache=True)

    instructions = [
        "Provide one random record based on the SCHEMA below.",
    ]
    prompt = format_prompt(
        {
            "instructions": format_instructions(instructions),
            "schema": Person,
        }
    )

    person = model.pydantic(prompt, system="Respond in JSON.", pydantic_model=Person)
    print(person)

# Output: name='Olivia Chen' age=29

Design

Class Inheritance

classDiagram
    BaseAIModel <|-- OpenAICompatibleModel
    OpenAICompatibleModel <|-- AnthropicModel
    OpenAICompatibleModel <|-- OpenAIModel
    OpenAICompatibleModel <|-- VLLMModel
    OpenAICompatibleModel <|-- GrokModel
    BaseAIModel <|-- GoogleModel

    class BaseAIModel {
        <<abstract>>
    }
    class OpenAICompatibleModel
    class AnthropicModel
    class OpenAIModel
    class VLLMModel
    class GrokModel
    class GoogleModel

Example Call Flow

sequenceDiagram
    participant Client
    participant BaseAIModel
    participant OpenAICompatibleModel
    participant SpecificModel
    participant API

    Client->>BaseAIModel: json()
    BaseAIModel->>BaseAIModel: _retry_wrapper()
    BaseAIModel->>OpenAICompatibleModel: _json()
    OpenAICompatibleModel->>OpenAICompatibleModel: format()
    OpenAICompatibleModel->>OpenAICompatibleModel: _make_request()
    OpenAICompatibleModel->>API: HTTP POST
    API-->>OpenAICompatibleModel: Response
    OpenAICompatibleModel->>OpenAICompatibleModel: _handle_json_response()
    OpenAICompatibleModel-->>BaseAIModel: JSONModelResponse
    BaseAIModel-->>Client: JSONModelResponse

License

The ALEA LLM client is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using the ALEA LLM client library, please open an issue on GitHub.

Learn More

To learn more about ALEA and its software and research projects like KL3M and leeky, visit the ALEA website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alea_llm_client-0.2.0.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alea_llm_client-0.2.0-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file alea_llm_client-0.2.0.tar.gz.

File metadata

  • Download URL: alea_llm_client-0.2.0.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for alea_llm_client-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3ab5a0c126c7dfe0626e430eec0cb23e6230004ac87aea62c35761a00e0cdeb3
MD5 a8aaa163e9c67d68f5dfaca2025922e1
BLAKE2b-256 4ea3a4c38c0599e57bb0564bedcbec8fcbdfe7620630a03f28f54f5ec652c458

See more details on using hashes here.

File details

Details for the file alea_llm_client-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for alea_llm_client-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50cdf3641830b91a3b704828bf0907179c9c40c7172610c5be4edbe6f8dbb98b
MD5 a8e30bffe4c1eb61a06fdd301e8459f9
BLAKE2b-256 1db4258ee8e9d997a3f1e1d63b4343a5f677076e96e09b4d1b1b32281899b092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page