Python library for efficient and convenient AI inference replay in testing, debugging and development

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

InferenceGate

Python library for efficient and convenient AI inference replay in testing, debugging and development, saving costs and time on repeated prompts.

Installation

pip install inference-gate

Features

Record-and-Replay Mode: Record new requests to cache, replay from cache when available
Replay-Only Mode: Only serve cached responses (for unit tests and CI)
Web UI Dashboard: Optional web-based dashboard for browsing cache entries, viewing statistics, and inspecting request/response details
Supports OpenAI Chat Completions API and Responses API
Supports streaming responses
Preserves prompt, temperature, model, and other metadata
YAML configuration file for persistent settings
CLI tools for easy management

Quick Start

1. Initialize Configuration (Optional)

inference-gate config init

This creates a configuration file at $USERDIR/.InferenceGate/config.yaml.

2. Test Your Upstream API Connection

inference-gate test-upstream --api-key $OPENAI_API_KEY

3. Start the Proxy

inference-gate start --api-key $OPENAI_API_KEY

4. Test the Running Proxy

inference-gate test-gate

5. Point Your Client to the Proxy

from openai import OpenAI

client = OpenAI(
    api_key="any-key",  # Not needed in replay mode
    base_url="http://localhost:8080/v1"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

CLI Commands

Server Commands

`start` - Record-and-Replay Mode (Default)

Replays cached inferences when available. On cache miss, forwards to upstream, records the response, and stores it for future replays.

inference-gate start [OPTIONS]

Options:

Option	Description	Default
`--port, -p`	Server port	8080
`--host, -h`	Server host	127.0.0.1
`--cache-dir, -c`	Cache directory	.inference_cache
`--upstream, -u`	Upstream API URL	https://api.openai.com
`--api-key, -k`	OpenAI API key	$OPENAI_API_KEY
`--max-live-requests`	Global limit on live upstream requests	(infinite)
`--web-ui`	Enable web UI dashboard	false
`--web-ui-port`	Web UI server port	8081
`--verbose, -v`	Enable verbose logging	false

`replay` - Replay-Only Mode

Only returns cached responses. Returns an error if a matching inference is not found in the cache. Useful for unit tests and CI pipelines.

inference-gate replay [OPTIONS]

Options:

Option	Description	Default
`--port, -p`	Server port	8080
`--host, -h`	Server host	127.0.0.1
`--cache-dir, -c`	Cache directory	.inference_cache
`--web-ui`	Enable web UI dashboard	false
`--web-ui-port`	Web UI server port	8081
`--verbose, -v`	Enable verbose logging	false

Test Commands

`test-gate` - Test a Running InferenceGate Instance

Sends a test prompt to a running InferenceGate proxy. Uses the same host/port from config, so no API key or extra options needed.

inference-gate test-gate [OPTIONS]

Options:

Option	Description	Default
`--host, -h`	Host of the running instance	127.0.0.1
`--port, -p`	Port of the running instance	8080
`--model, -m`	Model to use	gpt-4o-mini
`--prompt`	Custom test prompt	(built-in test prompt)
`--verbose, -v`	Enable verbose logging	false

`test-upstream` - Test Upstream API Directly

Sends a test prompt directly to the upstream API (bypassing InferenceGate) to verify the API key and endpoint.

inference-gate test-upstream [OPTIONS]

Options:

Option	Description	Default
`--upstream, -u`	Upstream API URL	https://api.openai.com
`--api-key, -k`	OpenAI API key	$OPENAI_API_KEY
`--model, -m`	Model to use	gpt-4o-mini
`--prompt`	Custom test prompt	(built-in test prompt)
`--verbose, -v`	Enable verbose logging	false

Cache Management

`cache list` - List Cached Entries

inference-gate cache list [--cache-dir PATH]

`cache info` - Show Cache Statistics

inference-gate cache info [--cache-dir PATH]

`cache clear` - Clear All Cached Entries

inference-gate cache clear [--cache-dir PATH] [--yes]

Web UI Dashboard

InferenceGate includes an optional web-based dashboard for browsing cached inference entries, viewing statistics, and inspecting request/response details.

Enabling the Web UI

Add the --web-ui flag when starting InferenceGate:

# Record-and-replay mode with web UI
inference-gate start --api-key $OPENAI_API_KEY --web-ui

# Replay-only mode with web UI
inference-gate replay --web-ui

The web UI will be available at http://localhost:8081 by default. You can customize the port with --web-ui-port:

inference-gate start --web-ui --web-ui-port 3000

Features

Dashboard: View cache statistics, current mode, and configuration at a glance
Cache List: Browse all cached entries in a sortable, filterable table
Entry Details: Inspect full request and response details including headers, body, and metadata
Search: Filter cache entries by ID, model, path, or method
Streaming Support: View streaming response chunks for SSE endpoints

Screenshots

Dashboard Page

Dashboard

Cache List Page

Cache List

Entry Detail Page

Entry Detail

Building the Frontend (Development Only)

The web UI frontend is pre-built and included in the package. You only need to build it if you're developing or modifying the frontend:

cd webui-frontend
npm install
npm run build
# Output goes to src/inference_gate/webui/static/

Requirements:

Node.js 16+ and npm (only for frontend development)
No runtime dependencies - the built static files are served by the Python backend

Configuration Management

`config show` - Show Current Configuration

inference-gate config show

`config init` - Initialize Configuration File

inference-gate config init [--force]

`config path` - Show Configuration File Path

inference-gate config path

Configuration File

InferenceGate uses a YAML configuration file to store default settings. The file is located at:

Windows: %USERPROFILE%\.InferenceGate\config.yaml
macOS/Linux: ~/.InferenceGate/config.yaml

You can specify a custom path using the --config global option:

inference-gate --config /path/to/config.yaml start

Configuration Options

# Server settings
host: "127.0.0.1"
port: 8080
max_live_requests: null  # Optional global limit on live upstream requests

# Upstream API settings
upstream: "https://api.openai.com"
# api_key is not stored in the config file for security
# Use OPENAI_API_KEY environment variable instead

# Storage settings
cache_dir: ".inference_cache"

# Logging settings
verbose: false

# Test command settings
test_model: "gpt-4o-mini"
test_prompt: "This is a test prompt. Reply with **ONLY** \"OK.\" to confirm that everything is ok. DO NOT output anything else."

Configuration Priority

Settings are loaded in the following order (later overrides earlier):

Built-in defaults
Configuration file
Environment variables (OPENAI_API_KEY)
Command-line options

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (used in record/test modes)

Development

Install development dependencies:

pip install -e ".[dev]"

Run tests:

pytest

Run linting:

ruff check src/ tests/

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inference_gate-0.1.0.tar.gz (243.2 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inference_gate-0.1.0-py3-none-any.whl (206.3 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file inference_gate-0.1.0.tar.gz.

File metadata

Download URL: inference_gate-0.1.0.tar.gz
Upload date: May 17, 2026
Size: 243.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for inference_gate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ede4c9c8f8654e8841394f6b26793ad575ee34f9a6a67ba1768556aff707da0f`
MD5	`c8a3ae614452c716191c17b1bbcb9e44`
BLAKE2b-256	`d0475dece7d2b95e001db9a7981992b863df3bffa55cf2ac8ab1cc5209ec5434`

See more details on using hashes here.

File details

Details for the file inference_gate-0.1.0-py3-none-any.whl.

File metadata

Download URL: inference_gate-0.1.0-py3-none-any.whl
Upload date: May 17, 2026
Size: 206.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for inference_gate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4ac37aeeb5bb9e0d2cead0799706b076446a40ba78b35541f4844aaf430c957`
MD5	`e4d53a4e43f590a5317e80dcb439b078`
BLAKE2b-256	`2c19a2543aabfa421e0e2bd0a7f88ef414db299a298b231b167a61b2c19438e0`

See more details on using hashes here.

inference-gate 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

InferenceGate

Installation

Features

Quick Start

1. Initialize Configuration (Optional)

2. Test Your Upstream API Connection

3. Start the Proxy

4. Test the Running Proxy

5. Point Your Client to the Proxy

CLI Commands

Server Commands

start - Record-and-Replay Mode (Default)

replay - Replay-Only Mode

Test Commands

test-gate - Test a Running InferenceGate Instance

test-upstream - Test Upstream API Directly

Cache Management

cache list - List Cached Entries

cache info - Show Cache Statistics

cache clear - Clear All Cached Entries

Web UI Dashboard

Enabling the Web UI

Features

Screenshots

Building the Frontend (Development Only)

Configuration Management

config show - Show Current Configuration

config init - Initialize Configuration File

config path - Show Configuration File Path

Configuration File

Configuration Options

Configuration Priority

Environment Variables

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`start` - Record-and-Replay Mode (Default)

`replay` - Replay-Only Mode

`test-gate` - Test a Running InferenceGate Instance

`test-upstream` - Test Upstream API Directly

`cache list` - List Cached Entries

`cache info` - Show Cache Statistics

`cache clear` - Clear All Cached Entries

`config show` - Show Current Configuration

`config init` - Initialize Configuration File

`config path` - Show Configuration File Path