Skip to main content

Intelligent No Frills LLM Router - A unified interface for multiple LLM providers

Project description

Nous LLM

Intelligent No Frills LLM Router - A unified Python interface for multiple Large Language Model providers

PyPI version Python 3.12+ License: MPL 2.0 Code style: Ruff Issues

Why Nous LLM?

Switch between LLM providers with a single line of code. Build AI applications without vendor lock-in.

# Same interface, different providers
config = ProviderConfig(provider="openai", model="gpt-4o")     # OpenAI
config = ProviderConfig(provider="anthropic", model="claude-3-5-sonnet")  # Anthropic
config = ProviderConfig(provider="gemini", model="gemini-2.5-pro")  # Google

โœจ Key Features

  • ๐Ÿ”„ Unified Interface: Single API for multiple LLM providers
  • โšก Async Support: Both synchronous and asynchronous interfaces
  • ๐Ÿ›ก๏ธ Type Safety: Full typing with Pydantic v2 validation
  • ๐Ÿ”€ Provider Flexibility: Easy switching between providers and models
  • โ˜๏ธ Serverless Ready: Optimized for AWS Lambda and Google Cloud Run
  • ๐Ÿšจ Error Handling: Comprehensive error taxonomy with provider context
  • ๐Ÿ”Œ Extensible: Plugin architecture for custom providers

๐Ÿš€ Quick Start

Install

pip install nous-llm

Use in 3 Lines

from nous_llm import generate, ProviderConfig, Prompt

config = ProviderConfig(provider="openai", model="gpt-4o")
response = generate(config, Prompt(input="What is the capital of France?"))
print(response.text)  # "Paris is the capital of France."

๐Ÿ“ฆ Supported Providers

Provider Popular Models Latest Models
OpenAI GPT-4o, GPT-4-turbo, GPT-3.5-turbo GPT-5, o3, o4-mini
Anthropic Claude 3.5 Sonnet, Claude 3 Haiku Claude Opus 4.1
Google Gemini 1.5 Pro, Gemini 1.5 Flash Gemini 2.5 Pro
xAI Grok Beta Grok 4, Grok 4 Heavy
OpenRouter Llama 3.3 70B, Mixtral Llama 4 Maverick

Installation

Quick Install

# Using pip
pip install nous-llm

# Using uv (recommended)
uv add nous-llm

Installation Options

# Install with specific provider support
pip install nous-llm[openai]      # OpenAI only
pip install nous-llm[anthropic]   # Anthropic only
pip install nous-llm[all]         # All providers

# Development installation
pip install nous-llm[dev]         # Includes testing tools

Environment Setup

Set your API keys as environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export XAI_API_KEY="xai-..."
export OPENROUTER_API_KEY="sk-or-..."

Or create a .env file:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
XAI_API_KEY=xai-...
OPENROUTER_API_KEY=sk-or-...

Usage Examples

1. Basic Synchronous Usage

from nous_llm import generate, ProviderConfig, Prompt

# Configure your provider
config = ProviderConfig(
    provider="openai",
    model="gpt-4o",
    api_key="your-api-key"  # or set OPENAI_API_KEY env var
)

# Create a prompt
prompt = Prompt(
    instructions="You are a helpful assistant.",
    input="What is the capital of France?"
)

# Generate response
response = generate(config, prompt)
print(response.text)  # "Paris is the capital of France."

2. Asynchronous Usage

import asyncio
from nous_llm import agenenerate, ProviderConfig, Prompt

async def main():
    config = ProviderConfig(
        provider="anthropic",
        model="claude-3-5-sonnet-20241022"
    )
    
    prompt = Prompt(
        instructions="You are a creative writing assistant.",
        input="Write a haiku about coding."
    )
    
    response = await agenenerate(config, prompt)
    print(response.text)

asyncio.run(main())

3. Client-Based Approach (Recommended for Multiple Calls)

from nous_llm import LLMClient, ProviderConfig, Prompt

# Create a reusable client
client = LLMClient(ProviderConfig(
    provider="gemini",
    model="gemini-1.5-pro"
))

# Generate multiple responses efficiently
prompts = [
    Prompt(instructions="You are helpful.", input="What is AI?"),
    Prompt(instructions="You are creative.", input="Write a poem."),
]

for prompt in prompts:
    response = client.generate(prompt)
    print(f"{response.provider}: {response.text}")

Advanced Features

4. Provider-Specific Parameters

from nous_llm import generate, ProviderConfig, Prompt, GenParams

# OpenAI GPT-5 with reasoning mode
config = ProviderConfig(provider="openai", model="gpt-5")
params = GenParams(
    max_tokens=1000,
    temperature=0.7,
    extra={"reasoning": True}  # OpenAI-specific
)

# OpenAI O-series reasoning model
config = ProviderConfig(provider="openai", model="o3-mini")
params = GenParams(
    max_tokens=1000,
    temperature=0.7,  # Will be automatically set to 1.0 with a warning
)

# Anthropic with thinking tokens
config = ProviderConfig(provider="anthropic", model="claude-3-5-sonnet-20241022")
params = GenParams(
    extra={"thinking": True}  # Anthropic-specific
)

response = generate(config, prompt, params)

4. Gemini Thinking Functionality

from nous_llm import generate, ProviderConfig, Prompt, GenParams

# Enable thinking mode for enhanced reasoning
config = ProviderConfig(
    provider="gemini", 
    model="gemini-2.5-pro"  # Use thinking-enabled model
)

prompt = Prompt(
    instructions="You are a math tutor. Show your step-by-step reasoning.",
    input="Calculate the area of a circle with radius 7 cm, then find what percentage this is of a square with side length 15 cm."
)

# Configure thinking parameters
params = GenParams(
    max_tokens=1500,
    temperature=0.3,
    extra={
        "include_thoughts": True,      # Show the model's reasoning process
        "thinking_budget": 8000        # Allow up to 8000 tokens for thinking
    }
)

response = generate(config, prompt, params)
print(response.text)

# Output format:
# **Thinking:**
# Let me break this down step by step...
# First, I need to calculate the area of the circle...
# 
# **Response:**
# The area of the circle is approximately 153.94 cmยฒ...

Thinking Parameters:

  • include_thoughts: Boolean to enable/disable thinking output
  • thinking_budget: Integer token budget for the thinking process
  • Works with thinking-enabled models like gemini-2.5-pro

Note for Developers:

Parameter Changes in OpenAI's Latest Models:

  • Token Limits: GPT-5 series and O-series models (o1, o3, o4-mini) use max_completion_tokens instead of max_tokens. The library automatically handles this with intelligent parameter mapping and fallback mechanisms.
  • Temperature: O-series reasoning models (o1, o3, o4-mini) and GPT-5 thinking/reasoning variants require temperature=1.0. The library automatically adjusts this and warns you if a different value is requested.

You can continue using the standard parameters in GenParams - they will be automatically converted to the correct parameter for each model.

5. Custom Base URLs & Proxies

# Use OpenRouter as a proxy for OpenAI models
config = ProviderConfig(
    provider="openrouter",
    model="openai/gpt-4o",
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

6. Error Handling

from nous_llm import generate, AuthError, RateLimitError, ProviderError

try:
    response = generate(config, prompt)
except AuthError as e:
    print(f"Authentication failed: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except ProviderError as e:
    print(f"Provider error: {e}")

6. OpenRouter Thinking Functionality

OpenRouter supports thinking/reasoning functionality across multiple model families with different parameter configurations:

from nous_llm import generate, ProviderConfig, Prompt, GenParams

# OpenAI o-series models (effort-based reasoning)
config = ProviderConfig(
    provider="openrouter",
    model="openai/o1-preview",
    api_key="your-openrouter-key"
)

prompt = Prompt(
    instructions="You are a math tutor. Show your reasoning clearly.",
    input="Calculate compound interest on $1000 at 5% for 3 years."
)

# Effort-based reasoning (OpenAI o1/o3/GPT-5 models)
params = GenParams(
    max_tokens=2000,
    temperature=1.0,  # Required for o-series models
    extra={
        "reasoning_effort": "high",      # "low", "medium", "high"
        "reasoning_exclude": False       # Include reasoning in response
    }
)

response = generate(config, prompt, params)
print(response.text)

Different Model Types:

# Anthropic Claude (max_tokens-based reasoning)
config = ProviderConfig(
    provider="openrouter",
    model="anthropic/claude-3-5-sonnet",
    api_key="your-openrouter-key"
)

params = GenParams(
    max_tokens=1500,
    extra={
        "reasoning_max_tokens": 6000,    # Token budget for reasoning
        "reasoning_exclude": False       # Show reasoning process
    }
)

# xAI Grok (effort-based reasoning)
config = ProviderConfig(
    provider="openrouter", 
    model="xai/grok-beta",
    api_key="your-openrouter-key"
)

params = GenParams(
    max_tokens=2000,
    extra={
        "reasoning_effort": "medium",    # Reasoning effort level
        "reasoning_exclude": True        # Hide reasoning, show only final answer
    }
)

# Legacy parameter support (backward compatibility)
params = GenParams(
    max_tokens=1500,
    extra={
        "include_thoughts": True,        # Enable thinking
        "thinking_budget": 4000          # Token budget (maps to appropriate param)
    }
)

Supported Models:

  • OpenAI: o1-preview, o1-mini, o3-mini, gpt-5-turbo (effort-based)
  • Anthropic: claude-3-5-sonnet, claude-3-5-haiku (max_tokens-based)
  • xAI: grok-beta, grok-2 (effort-based)
  • Google: gemini-2.0-flash-thinking-exp (max_tokens-based)

The adapter automatically detects model capabilities and applies the correct reasoning parameters.

Dynamic Token Limits

The library now supports dynamic token limits based on actual provider and model capabilities, replacing the previous static 32k limit:

from nous_llm import generate, ProviderConfig, Prompt, GenParams

# High-capacity models now supported
config = ProviderConfig(
    provider="openai",
    model="gpt-oss-120b",  # Supports 131,072 tokens
    api_key="your-api-key"
)

params = GenParams(
    max_tokens=100000,  # No longer limited to 32k
    temperature=0.7
)

response = generate(config, prompt, params)

Model-Specific Limits:

  • OpenAI: 4,096 (GPT-4o Realtime) to 131,072 (GPT-OSS series)
  • Gemini: 2,048 (Gemini 2.0 Flash) to 65,536 (Gemini 2.5 series)
  • xAI: 32,768 tokens (Grok series)
  • Anthropic: 16,384 tokens (Claude series)
  • OpenRouter: Varies by underlying model

The library automatically validates token limits and provides clear error messages:

# This will raise ValueError with helpful message
params = GenParams(max_tokens=200000)  # Exceeds model limit
response = generate(config, prompt, params)
# ValueError: max_tokens (200000) exceeds model limit (131072) for openai/gpt-oss-120b

Benefits:

  • โœ… No artificial 32k limit restriction
  • โœ… Model-specific accurate validation
  • โœ… Support for high-capacity models
  • โœ… Automatic limit detection and caching
  • โœ… Clear error messages when limits exceeded

Production Integration

FastAPI Web Service

from fastapi import FastAPI, HTTPException
from nous_llm import agenenerate, ProviderConfig, Prompt, AuthError

app = FastAPI(title="Nous LLM API")

@app.post("/generate")
async def generate_text(request: dict):
    try:
        config = ProviderConfig(**request["config"])
        prompt = Prompt(**request["prompt"])
        
        response = await agenenerate(config, prompt)
        return {
            "text": response.text, 
            "usage": response.usage,
            "provider": response.provider
        }
    except AuthError as e:
        raise HTTPException(status_code=401, detail=str(e))

AWS Lambda Function

import json
from nous_llm import LLMClient, ProviderConfig, Prompt

# Global client for connection reuse across invocations
client = LLMClient(ProviderConfig(
    provider="openai",
    model="gpt-4o-mini"
))

def lambda_handler(event, context):
    try:
        prompt = Prompt(
            instructions=event["instructions"],
            input=event["input"]
        )
        
        response = client.generate(prompt)
        
        return {
            "statusCode": 200,
            "body": json.dumps({
                "text": response.text,
                "usage": response.usage.model_dump() if response.usage else None
            })
        }
    except Exception as e:
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)})
        }

Development

Project Setup

# Clone the repository
git clone https://github.com/amod-ml/nous-llm.git
cd nous-llm

# Install with development dependencies
uv sync --group dev

# Install pre-commit hooks (includes GPG validation)
./scripts/setup-gpg-hook.sh

Testing & Quality

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=nous_llm

# Format and lint code
uv run ruff format
uv run ruff check

# Type checking
uv run mypy src/nous_llm

Adding a New Provider

  1. Create adapter in src/nous_llm/adapters/
  2. Implement the AdapterProtocol
  3. Register in src/nous_llm/core/adapters.py
  4. Add model patterns to src/nous_llm/core/registry.py
  5. Add comprehensive tests in tests/

Examples & Resources

Complete Examples

  • ๐Ÿ“ examples/basic_usage.py - Core functionality demos
  • ๐Ÿ“ examples/fastapi_service.py - REST API service
  • ๐Ÿ“ examples/lambda_example.py - AWS Lambda function

Documentation & Support

๐Ÿ› Found an Issue?

We'd love to hear from you! Please report any issues you encounter. When reporting issues, please include:

  • Python version
  • Nous LLM version (pip show nous-llm)
  • Minimal code to reproduce the issue
  • Full error traceback

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

๐Ÿ”’ Security Requirements for Contributors

ALL commits to this repository MUST be GPG-signed. This is automatically enforced by a pre-commit hook.

Why GPG Signing?

  • ๐Ÿ” Authentication: Every commit is cryptographically verified
  • ๐Ÿ›ก๏ธ Integrity: Commits cannot be tampered with after signing
  • ๐Ÿ“ Non-repudiation: Contributors cannot deny authorship of signed commits
  • ๐Ÿ”— Supply Chain Security: Protection against commit spoofing attacks

Quick Setup for Contributors

New to the project?

# Automated setup - installs hook and guides through GPG configuration
./scripts/setup-gpg-hook.sh

Already have GPG configured?

# Enable GPG signing for this repository
git config commit.gpgsign true
git config user.signingkey YOUR_KEY_ID

Important Notes

  • โŒ Unsigned commits will be automatically rejected
  • โœ… The pre-commit hook validates your GPG setup before every commit
  • ๐Ÿ“‹ You must add your GPG public key to your GitHub account
  • ๐Ÿšซ The hook cannot be bypassed with --no-verify

Need Help?

  • ๐Ÿ“– Full Setup Guide: GPG Signing Documentation
  • ๐Ÿ”ง Troubleshooting: Run ./scripts/setup-gpg-hook.sh for diagnostics
  • ๐Ÿงช Quick Test: Try making a commit - the hook will guide you if anything's wrong

Development Requirements

  • โœ… Python 3.12+
  • ๐Ÿ” All commits must be GPG-signed
  • ๐Ÿงช Code must pass all tests and linting
  • ๐Ÿ“‹ Follow established patterns and conventions

๐Ÿ“„ License

This project is licensed under the Mozilla Public License 2.0 - see the LICENSE file for details.


Built with โค๏ธ for the AI community
๐Ÿ”’ GPG signing ensures the authenticity and integrity of all code contributions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nous_llm-0.3.0.tar.gz (137.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nous_llm-0.3.0-py3-none-any.whl (42.3 kB view details)

Uploaded Python 3

File details

Details for the file nous_llm-0.3.0.tar.gz.

File metadata

  • Download URL: nous_llm-0.3.0.tar.gz
  • Upload date:
  • Size: 137.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nous_llm-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bcd3c7fb3e0733e6cdefb87a5263e2ef7d5598c9d7d85a2ad471314056dd5ae4
MD5 11c74a63b3505ab4115ce1aa0177a5ee
BLAKE2b-256 9e773f1bcbf880779c0dd68f88ee5019fb8d612a8fe8b9d95d54f60c613c4abb

See more details on using hashes here.

Provenance

The following attestation bundles were made for nous_llm-0.3.0.tar.gz:

Publisher: release.yml on amod-ml/nous-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nous_llm-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: nous_llm-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 42.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nous_llm-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2c00e088675e07674017fe751d22294d2cb7954ae56ff60074877b8b9c9bb71
MD5 a29cdf5462baf9058ef1852983159dd7
BLAKE2b-256 4054476d8daafac4c371418e522bbcbf377228a35a4bfd9f769cefadaaa46a36

See more details on using hashes here.

Provenance

The following attestation bundles were made for nous_llm-0.3.0-py3-none-any.whl:

Publisher: release.yml on amod-ml/nous-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page