"Inference Proxy" is OpenAI-compatible http proxy server for inferencing various LLMs capable of working with Google, Anthropic, OpenAI APIs, local PyTorch inference, etc.

These details have not been verified by PyPI

Project links

Source Code

Project description

License

Inference Proxy

Inference Proxy is an OpenAI-compatible HTTP proxy server for various Large Language Models (LLMs) inference. It provides a unified interface for working with different AI providers through a single API endpoint that follows the OpenAI format. Stream like OpenAI, authenticate with your own API keys, and keep clients unchanged.

✨ Features

Provider Agnostic: Connect to OpenAI, Anthropic, Google AI, local models, and more using a single API
Unified Interface: Access all models through the standard OpenAI API format
Dynamic Routing: Route requests to different LLM providers based on model name patterns
Stream Support: Full streaming support for real-time responses
API Key Management: Configurable API key validation and access control
Easy Configuration: Simple TOML configuration files for setup

🚀 Getting Started

Installation

pip install inference-proxy

Quick Start

Create a config.toml file:

host = "0.0.0.0"
port = 8000

[connections]
[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1/"
api_key = "env:OPENAI_API_KEY"

[connections.anthropic]
api_type = "anthropic"
api_key = "env:ANTHROPIC_API_KEY"

[routing]
"gpt*" = "openai.*"
"claude*" = "anthropic.*"
"*" = "openai.gpt-3.5-turbo"

[groups.default]
api_keys = ["YOUR_API_KEY_HERE"]

Start the server:

inference-proxy

Use it with any OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY_HERE",
    base_url="http://localhost:8000/v1"
)

completion = client.chat.completions.create(
    model="gpt-5",  # This will be routed to OpenAI based on config
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(completion.choices[0].message.content)

Or use the same endpoint with Claude models:

completion = client.chat.completions.create(
    model="claude-opus-4-1-20250805",  # This will be routed to Anthropic based on config
    messages=[{"role": "user", "content": "Hello, world!"}]
)

📝 Configuration

Inference Proxy is configured through a TOML file that specifies connections, routing rules, and access control.

Basic Structure

host = "0.0.0.0"  # Interface to bind to
port = 8000       # Port to listen on
dev_autoreload = false  # Enable for development

# API key validation function (optional)
check_api_key = "lm_proxy.core.check_api_key"

# LLM Provider Connections
[connections]

[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1/"
api_key = "env:OPENAI_API_KEY"

[connections.google]
api_type = "google_ai_studio"
api_key = "env:GOOGLE_API_KEY"

# Routing rules (model_pattern = "connection.model")
[routing]
"gpt*" = "openai.*"     # Route all GPT models to OpenAI
"claude*" = "anthropic.*"  # Route all Claude models to Anthropic
"gemini*" = "google.*"  # Route all Gemini models to Google
"*" = "openai.gpt-3.5-turbo"  # Default fallback

# Access control groups
[groups.default]
api_keys = [
    "KEY1",
    "KEY2"
]

Environment Variables

You can use environment variables in your configuration file by prefixing values with env::

[connections.openai]
api_key = "env:OPENAI_API_KEY"

Load these from a .env file or set them in your environment before starting the server.

🔌 API Usage

Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.

Endpoint

POST /v1/chat/completions

Request Format

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "stream": false
}

Response Format

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ]
}

🛠️ Advanced Usage

Custom API Key Validation

You can implement your own API key validation function:

# my_validators.py
def validate_api_key(api_key: str) -> str | None:
    """
    Validate an API key and return the group name if valid.
    
    Args:
        api_key: The API key to validate
        
    Returns:
        The name of the group if valid, None otherwise
    """
    if api_key == "secret-key":
        return "admin"
    elif api_key.startswith("user-"):
        return "users"
    return None

Then reference it in your config:

check_api_key = "my_validators.validate_api_key"

Dynamic Model Routing

The routing section allows flexible pattern matching with wildcards:

[routing]
"gpt-4*" = "openai.gpt-4"           # Route gpt-4 requests to OpenAI GPT-4
"gpt-3.5*" = "openai.gpt-3.5-turbo" # Route gpt-3.5 requests to OpenAI
"claude*" = "anthropic.*"           # Pass model name as-is to Anthropic
"gemini*" = "google.*"              # Pass model name as-is to Google
"custom*" = "local.llama-7b"        # Map any "custom*" to a specific local model
"*" = "openai.gpt-3.5-turbo"        # Default fallback for unmatched models

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Project details

These details have not been verified by PyPI

Project links

Source Code

Release history Release notifications | RSS feed

3.2.2

Apr 2, 2026

3.2.1

Mar 31, 2026

3.2.0

Mar 30, 2026

3.1.0

Mar 25, 2026

3.0.2

Feb 19, 2026

3.0.1

Feb 10, 2026

3.0.0

Feb 5, 2026

3.0.0.dev1 pre-release

Jan 20, 2026

2.1.1

Nov 20, 2025

2.1.0

Nov 2, 2025

2.0.0

Oct 26, 2025

1.1.0

Oct 15, 2025

1.0.0

Oct 15, 2025

0.4.0

Oct 14, 2025

0.3.0

Oct 9, 2025

This version

0.2.2

Oct 8, 2025

0.2.1

Aug 28, 2025

0.2.0

Aug 27, 2025

0.0.3

May 24, 2025

0.0.2

May 24, 2025

0.0.1

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inference_proxy-0.2.2.tar.gz (9.2 kB view details)

Uploaded Oct 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inference_proxy-0.2.2-py3-none-any.whl (11.4 kB view details)

Uploaded Oct 8, 2025 Python 3

File details

Details for the file inference_proxy-0.2.2.tar.gz.

File metadata

Download URL: inference_proxy-0.2.2.tar.gz
Upload date: Oct 8, 2025
Size: 9.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for inference_proxy-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`ac4beb18d04d3ac7661b8d2604b159168306ddde50a384d20a494fb956e92d33`
MD5	`852294574d256558c4980f0211021037`
BLAKE2b-256	`eba979eedfddd3e3f072fcbba770591bec70f78b8386dd2884e1b7dd2d1045a3`

See more details on using hashes here.

File details

Details for the file inference_proxy-0.2.2-py3-none-any.whl.

File metadata

Download URL: inference_proxy-0.2.2-py3-none-any.whl
Upload date: Oct 8, 2025
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for inference_proxy-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d48d7f1624147a2e8f66e9cf7a8c8858e7c04b7ba7fdfada148893688e2c401`
MD5	`d97bbc38d257734150668e5c9fb83fc4`
BLAKE2b-256	`69b69d30fc80469bddb7361d545720b2bad9264419dce5e4fba890095f330f70`

See more details on using hashes here.

inference-proxy 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Inference Proxy

✨ Features

🚀 Getting Started

Installation

Quick Start

📝 Configuration

Basic Structure

Environment Variables

🔌 API Usage

Endpoint

Request Format

Response Format

🛠️ Advanced Usage

Custom API Key Validation

Dynamic Model Routing

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes