Add your description here

Project description

Simple AI Gateway

A lightweight AI API Gateway built with Python and FastAPI. It follows the OpenAI-compatible request format and can be configured to either echo back prompts or forward them to a real AI inference backend.

🚀 Quick Start

1. Prerequisites

Ensure you have uv installed. uv is an extremely fast Python package manager that replaces pip and venv.

# If you don't have uv yet (macOS)
brew install uv

2. Installation & Environment Setup

uv will automatically manage your virtual environment and dependencies based on pyproject.toml.

# Clone the repository
git clone <your-repo-url>
cd simple-ai-gateway/src/simple_ai_gateway

# Sync dependencies and create a virtual environment automatically
uv sync

3. Configuration

The gateway uses a config.yaml file for routing. Ensure this file exists in the same directory as main.py.

Sample config.yaml:

default_backend: local

backends:
  local:
    type: local
    url: http://127.0.0.1:8081
  modal:
    type: modal
    url: https:/YOUR_MODAL_URL
  modal_vllm:
    type: vllm
    url: https://YOUR_MODAL_VLLM_URL

4. Run the Server

Start the server.

uv run main.py

5. Testing the Gateway

Start the server at 8080:

uv run uvicorn main:app --host 0.0.0.0 --port 8080

Once the server is at http://localhost:8080, you can verify it using the following methods:

Method 1: Basic Echo Test (via cURL)

Test if the gateway correctly extracts your message and echoes it back:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Request-ID: my-custom-id-123" \
-d '{
  "messages": [
    {"role": "user", "content": "Hello, world!"}
  ]
}'

What to look for:

The response should contain "content": "Echo: Hello, world!".
The "id" field should match "my-custom-id-123".

Method 2: Auto-ID Generation Test

If you don't provide an X-Request-ID header, the gateway will generate a unique UUID for you:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "messages": [{"role": "user", "content": "Hi, baby!"}]
}'

What to look for: A valid UUID in the "id" field (e.g., 550e8400-e29b-...).

Method 3: Streaming Test

Test the Server-Sent Events (SSE) streaming functionality. Use the -N flag to disable buffering and see the "typewriter" effect:

curl -N -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "stream": true,
  "messages": [{"role": "user", "content": "This is a streaming test."}]
}'

What to look for: The response should arrive in chunks (prefixes of data: {...}) rather than all at once.

Method 4: Rate Limiting Test

The gateway is configured to allow 5 requests per minute per IP. You can test this by running a quick loop:

for i in {1..6}; do 
  curl -s -o /dev/null -w "Request $i: %{http_code}\n" -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "ping"}]}'; 
done

What to look for: The first 5 requests should return 200, and the 6th request should return 429 (Too Many Requests).

Method 5: Interactive API Docs

FastAPI automatically generates a Swagger UI. You can test the API directly from your browser: http://localhost:8080/docs

6. Routing Verification

Method 1: Local Route (Echo)

Verify that specifying the local model triggers the local echo backend:

curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "local", "messages": [{"role": "user", "content": "Hello local"}]}'

Expected Response:

{
  "id": "...",
  "choices": [{"message": {"role": "assistant", "content": "Echo: Hello local"}, "finish_reason": "stop"}],
  "usage": {"total_tokens": 17}
}

Method 2: Remote Route - Non-Streaming

Verify forwarding to a remote inference backend (e.g., TinyLlama on Modal).

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"modal","stream":false,"messages":[{"role":"user","content":"What is the capital city in US"}]}'

Expected Response: Note: The content will vary depending on the specific model (e.g., TinyLlama) deployed on your backend.

{
  "id": "cffcf1de-30d6-4a1c-b06b-b56af8ef7d46",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": " Yes, the capital city of the United States is Washington D.C."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 62,
    "total_tokens": 62
  }
}

Method 3: Remote Route - Streaming

Verify the gateway's ability to handle Server-Sent Events (SSE). Use the -N flag to disable buffering and observe the real-time token generation.

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"modal","stream":true,"messages":[{"role":"user","content":"What is the capital city in US"}]}'

Example Response (Chunks):

data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "Boston, "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Massachusetts "}, "finish_reason": null}]}
...
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "The "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Star "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Spangled "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Banner "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Method 4: Fallback Logic (Missing Model)

Verify that an unknown model correctly falls back to the default_backend (local):

curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "unknown-model", "messages": [{"role": "user", "content": "Where am I?"}]}'

Expect: Response content prefixed with Echo: if default_backend is set to local.

7. Features

Interface Driven: Clean generate() contract for all backend.
Dynamic Routing: Route requests based on the model field in the payload.
Config-Driven: Add or update backends in config.yaml with zero code changes.
Streaming: Supports SSE-based streaming responses.
Rate Limiting: Built-in memory-based sliding window protection.

Project details

Release history Release notifications | RSS feed

This version

0.1.3

Mar 6, 2026

0.1.1

Mar 4, 2026

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ai_gateway-0.1.3.tar.gz (62.3 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_ai_gateway-0.1.3-py3-none-any.whl (10.9 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file simple_ai_gateway-0.1.3.tar.gz.

File metadata

Download URL: simple_ai_gateway-0.1.3.tar.gz
Upload date: Mar 6, 2026
Size: 62.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ai_gateway-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`da8f800a6dda764e67cbf799932cdbfb9f4010ad35e2b75afb174961be1907a6`
MD5	`724bb64881926d030b799f9443860280`
BLAKE2b-256	`f7b250506acb491f351db454f7d6c90f9ad84d59682a3a3317de0adf9504bfef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ai_gateway-0.1.3.tar.gz:

Publisher: release.yml on miaomiaoxu99/simple-ai-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: simple_ai_gateway-0.1.3.tar.gz
- Subject digest: da8f800a6dda764e67cbf799932cdbfb9f4010ad35e2b75afb174961be1907a6
- Sigstore transparency entry: 1052019016
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: miaomiaoxu99/simple-ai-gateway@25ac4a4e2a206e2295bb8e654ad970c78b7137f0
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/miaomiaoxu99
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25ac4a4e2a206e2295bb8e654ad970c78b7137f0
- Trigger Event: release

File details

Details for the file simple_ai_gateway-0.1.3-py3-none-any.whl.

File metadata

Download URL: simple_ai_gateway-0.1.3-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ai_gateway-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`252240f8d9bcdf708aae6525e4008fc1ca6e6a8033eda54f2d3457a506b1fde7`
MD5	`c6358e4fe1143be40ec4616764666df4`
BLAKE2b-256	`4e6bf2f6a0a9407a48561260e313196abdc298f76d06fce0b59e31a7a2aec220`

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ai_gateway-0.1.3-py3-none-any.whl:

Publisher: release.yml on miaomiaoxu99/simple-ai-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: simple_ai_gateway-0.1.3-py3-none-any.whl
- Subject digest: 252240f8d9bcdf708aae6525e4008fc1ca6e6a8033eda54f2d3457a506b1fde7
- Sigstore transparency entry: 1052019115
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: miaomiaoxu99/simple-ai-gateway@25ac4a4e2a206e2295bb8e654ad970c78b7137f0
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/miaomiaoxu99
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@25ac4a4e2a206e2295bb8e654ad970c78b7137f0
- Trigger Event: release

simple-ai-gateway 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Simple AI Gateway

🚀 Quick Start

1. Prerequisites

2. Installation & Environment Setup

3. Configuration

4. Run the Server

5. Testing the Gateway

Method 1: Basic Echo Test (via cURL)

Method 2: Auto-ID Generation Test

Method 3: Streaming Test

Method 4: Rate Limiting Test

Method 5: Interactive API Docs

6. Routing Verification

Method 1: Local Route (Echo)

Method 2: Remote Route - Non-Streaming

Method 3: Remote Route - Streaming

Method 4: Fallback Logic (Missing Model)

7. Features

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance