Add your description here
Project description
Simple AI Gateway
A lightweight AI API Gateway built with Python and FastAPI. It follows the OpenAI-compatible request format and can be configured to either echo back prompts or forward them to a real AI inference backend.
🚀 Quick Start
1. Prerequisites
Ensure you have uv installed. uv is an extremely fast Python package manager that replaces pip and venv.
# If you don't have uv yet (macOS)
brew install uv
2. Installation & Environment Setup
uv will automatically manage your virtual environment and dependencies based on pyproject.toml.
# Clone the repository
git clone <your-repo-url>
cd simple-ai-gateway/src/simple_ai_gateway
# Sync dependencies and create a virtual environment automatically
uv sync
3. Configuration
The gateway uses a config.yaml file for routing. Ensure this file exists in the same directory as main.py.
Sample config.yaml:
default_backend: local
backends:
local:
type: local
url: http://127.0.0.1:8081
modal:
type: modal
url: https:/YOUR_MODAL_URL
modal_vllm:
type: vllm
url: https://YOUR_MODAL_VLLM_URL
4. Run the Server
Start the server.
uv run main.py
5. Testing the Gateway
Start the server at 8080:
uv run uvicorn main:app --host 0.0.0.0 --port 8080
Once the server is at http://localhost:8080, you can verify it using the following methods:
Method 1: Basic Echo Test (via cURL)
Test if the gateway correctly extracts your message and echoes it back:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Request-ID: my-custom-id-123" \
-d '{
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
What to look for:
- The response should contain "content": "Echo: Hello, world!".
- The "id" field should match "my-custom-id-123".
Method 2: Auto-ID Generation Test
If you don't provide an X-Request-ID header, the gateway will generate a unique UUID for you:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hi, baby!"}]
}'
What to look for: A valid UUID in the "id" field (e.g., 550e8400-e29b-...).
Method 3: Streaming Test
Test the Server-Sent Events (SSE) streaming functionality. Use the -N flag to disable buffering and see the "typewriter" effect:
curl -N -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"stream": true,
"messages": [{"role": "user", "content": "This is a streaming test."}]
}'
What to look for: The response should arrive in chunks (prefixes of data: {...}) rather than all at once.
Method 4: Rate Limiting Test
The gateway is configured to allow 5 requests per minute per IP. You can test this by running a quick loop:
for i in {1..6}; do
curl -s -o /dev/null -w "Request $i: %{http_code}\n" -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "ping"}]}';
done
What to look for: The first 5 requests should return 200, and the 6th request should return 429 (Too Many Requests).
Method 5: Interactive API Docs
FastAPI automatically generates a Swagger UI. You can test the API directly from your browser: http://localhost:8080/docs
6. Routing Verification
Method 1: Local Route (Echo)
Verify that specifying the local model triggers the local echo backend:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "local", "messages": [{"role": "user", "content": "Hello local"}]}'
Expected Response:
{
"id": "...",
"choices": [{"message": {"role": "assistant", "content": "Echo: Hello local"}, "finish_reason": "stop"}],
"usage": {"total_tokens": 17}
}
Method 2: Remote Route - Non-Streaming
Verify forwarding to a remote inference backend (e.g., TinyLlama on Modal).
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"modal","stream":false,"messages":[{"role":"user","content":"What is the capital city in US"}]}'
Expected Response: Note: The content will vary depending on the specific model (e.g., TinyLlama) deployed on your backend.
{
"id": "cffcf1de-30d6-4a1c-b06b-b56af8ef7d46",
"choices": [
{
"message": {
"role": "assistant",
"content": " Yes, the capital city of the United States is Washington D.C."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 62,
"total_tokens": 62
}
}
Method 3: Remote Route - Streaming
Verify the gateway's ability to handle Server-Sent Events (SSE). Use the -N flag to disable buffering and observe the real-time token generation.
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"modal","stream":true,"messages":[{"role":"user","content":"What is the capital city in US"}]}'
Example Response (Chunks):
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "Boston, "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Massachusetts "}, "finish_reason": null}]}
...
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "The "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Star "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Spangled "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Banner "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]
Method 4: Fallback Logic (Missing Model)
Verify that an unknown model correctly falls back to the default_backend (local):
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "unknown-model", "messages": [{"role": "user", "content": "Where am I?"}]}'
Expect: Response content prefixed with Echo: if default_backend is set to local.
7. Features
- Interface Driven: Clean
generate()contract for all backend. - Dynamic Routing: Route requests based on the
modelfield in the payload. - Config-Driven: Add or update backends in
config.yamlwith zero code changes. - Streaming: Supports SSE-based streaming responses.
- Rate Limiting: Built-in memory-based sliding window protection.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_ai_gateway-0.1.3.tar.gz.
File metadata
- Download URL: simple_ai_gateway-0.1.3.tar.gz
- Upload date:
- Size: 62.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da8f800a6dda764e67cbf799932cdbfb9f4010ad35e2b75afb174961be1907a6
|
|
| MD5 |
724bb64881926d030b799f9443860280
|
|
| BLAKE2b-256 |
f7b250506acb491f351db454f7d6c90f9ad84d59682a3a3317de0adf9504bfef
|
Provenance
The following attestation bundles were made for simple_ai_gateway-0.1.3.tar.gz:
Publisher:
release.yml on miaomiaoxu99/simple-ai-gateway
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simple_ai_gateway-0.1.3.tar.gz -
Subject digest:
da8f800a6dda764e67cbf799932cdbfb9f4010ad35e2b75afb174961be1907a6 - Sigstore transparency entry: 1052019016
- Sigstore integration time:
-
Permalink:
miaomiaoxu99/simple-ai-gateway@25ac4a4e2a206e2295bb8e654ad970c78b7137f0 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/miaomiaoxu99
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@25ac4a4e2a206e2295bb8e654ad970c78b7137f0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file simple_ai_gateway-0.1.3-py3-none-any.whl.
File metadata
- Download URL: simple_ai_gateway-0.1.3-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
252240f8d9bcdf708aae6525e4008fc1ca6e6a8033eda54f2d3457a506b1fde7
|
|
| MD5 |
c6358e4fe1143be40ec4616764666df4
|
|
| BLAKE2b-256 |
4e6bf2f6a0a9407a48561260e313196abdc298f76d06fce0b59e31a7a2aec220
|
Provenance
The following attestation bundles were made for simple_ai_gateway-0.1.3-py3-none-any.whl:
Publisher:
release.yml on miaomiaoxu99/simple-ai-gateway
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simple_ai_gateway-0.1.3-py3-none-any.whl -
Subject digest:
252240f8d9bcdf708aae6525e4008fc1ca6e6a8033eda54f2d3457a506b1fde7 - Sigstore transparency entry: 1052019115
- Sigstore integration time:
-
Permalink:
miaomiaoxu99/simple-ai-gateway@25ac4a4e2a206e2295bb8e654ad970c78b7137f0 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/miaomiaoxu99
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@25ac4a4e2a206e2295bb8e654ad970c78b7137f0 -
Trigger Event:
release
-
Statement type: