Skip to main content

Add your description here

Project description

Simple AI Gateway

A lightweight AI API Gateway built with Python and FastAPI. It follows the OpenAI-compatible request format and can be configured to either echo back prompts or forward them to a real AI inference backend.

🚀 Quick Start

1. Prerequisites

Ensure you have uv installed. uv is an extremely fast Python package manager that replaces pip and venv.

# If you don't have uv yet (macOS)
brew install uv

2. Installation & Environment Setup

uv will automatically manage your virtual environment and dependencies based on pyproject.toml.

# Clone the repository
git clone <your-repo-url>
cd simple-ai-gateway/src/simple_ai_gateway

# Sync dependencies and create a virtual environment automatically
uv sync

3. Configuration

The gateway uses a config.yaml file for routing. Ensure this file exists in the same directory as main.py.

Sample config.yaml:

default_backend: local

backends:
  local:
    type: local
    url: http://127.0.0.1:8081
  modal:
    type: modal
    url: https:/YOUR_MODAL_URL
  modal_vllm:
    type: vllm
    url: https://YOUR_MODAL_VLLM_URL

4. Run the Server

Start the server.

uv run main.py

5. Testing the Gateway

Start the server at 8080:

uv run uvicorn main:app --host 0.0.0.0 --port 8080

Once the server is at http://localhost:8080, you can verify it using the following methods:

Method 1: Basic Echo Test (via cURL)

Test if the gateway correctly extracts your message and echoes it back:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Request-ID: my-custom-id-123" \
-d '{
  "messages": [
    {"role": "user", "content": "Hello, world!"}
  ]
}'

What to look for:

  • The response should contain "content": "Echo: Hello, world!".
  • The "id" field should match "my-custom-id-123".

Method 2: Auto-ID Generation Test

If you don't provide an X-Request-ID header, the gateway will generate a unique UUID for you:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "messages": [{"role": "user", "content": "Hi, baby!"}]
}'

What to look for: A valid UUID in the "id" field (e.g., 550e8400-e29b-...).

Method 3: Streaming Test

Test the Server-Sent Events (SSE) streaming functionality. Use the -N flag to disable buffering and see the "typewriter" effect:

curl -N -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "stream": true,
  "messages": [{"role": "user", "content": "This is a streaming test."}]
}'

What to look for: The response should arrive in chunks (prefixes of data: {...}) rather than all at once.

Method 4: Rate Limiting Test

The gateway is configured to allow 5 requests per minute per IP. You can test this by running a quick loop:

for i in {1..6}; do 
  curl -s -o /dev/null -w "Request $i: %{http_code}\n" -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "ping"}]}'; 
done

What to look for: The first 5 requests should return 200, and the 6th request should return 429 (Too Many Requests).

Method 5: Interactive API Docs

FastAPI automatically generates a Swagger UI. You can test the API directly from your browser: http://localhost:8080/docs

6. Routing Verification

Method 1: Local Route (Echo)

Verify that specifying the local model triggers the local echo backend:

curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "local", "messages": [{"role": "user", "content": "Hello local"}]}'

Expected Response:

{
  "id": "...",
  "choices": [{"message": {"role": "assistant", "content": "Echo: Hello local"}, "finish_reason": "stop"}],
  "usage": {"total_tokens": 17}
}

Method 2: Remote Route - Non-Streaming

Verify forwarding to a remote inference backend (e.g., TinyLlama on Modal).

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"modal","stream":false,"messages":[{"role":"user","content":"What is the capital city in US"}]}'

Expected Response: Note: The content will vary depending on the specific model (e.g., TinyLlama) deployed on your backend.

{
  "id": "cffcf1de-30d6-4a1c-b06b-b56af8ef7d46",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": " Yes, the capital city of the United States is Washington D.C."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 62,
    "total_tokens": 62
  }
}

Method 3: Remote Route - Streaming

Verify the gateway's ability to handle Server-Sent Events (SSE). Use the -N flag to disable buffering and observe the real-time token generation.

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"modal","stream":true,"messages":[{"role":"user","content":"What is the capital city in US"}]}'

Example Response (Chunks):

data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "Boston, "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Massachusetts "}, "finish_reason": null}]}
...
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "The "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Star "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Spangled "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Banner "}, "finish_reason": null}]}
data: {"id": "170a33e4-db0d-4803-983e-09dcccc048cd", "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Method 4: Fallback Logic (Missing Model)

Verify that an unknown model correctly falls back to the default_backend (local):

curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "unknown-model", "messages": [{"role": "user", "content": "Where am I?"}]}'

Expect: Response content prefixed with Echo: if default_backend is set to local.

7. Features

  • Interface Driven: Clean generate() contract for all backend.
  • Dynamic Routing: Route requests based on the model field in the payload.
  • Config-Driven: Add or update backends in config.yaml with zero code changes.
  • Streaming: Supports SSE-based streaming responses.
  • Rate Limiting: Built-in memory-based sliding window protection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ai_gateway-0.1.3.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_ai_gateway-0.1.3-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file simple_ai_gateway-0.1.3.tar.gz.

File metadata

  • Download URL: simple_ai_gateway-0.1.3.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ai_gateway-0.1.3.tar.gz
Algorithm Hash digest
SHA256 da8f800a6dda764e67cbf799932cdbfb9f4010ad35e2b75afb174961be1907a6
MD5 724bb64881926d030b799f9443860280
BLAKE2b-256 f7b250506acb491f351db454f7d6c90f9ad84d59682a3a3317de0adf9504bfef

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ai_gateway-0.1.3.tar.gz:

Publisher: release.yml on miaomiaoxu99/simple-ai-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ai_gateway-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_ai_gateway-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 252240f8d9bcdf708aae6525e4008fc1ca6e6a8033eda54f2d3457a506b1fde7
MD5 c6358e4fe1143be40ec4616764666df4
BLAKE2b-256 4e6bf2f6a0a9407a48561260e313196abdc298f76d06fce0b59e31a7a2aec220

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ai_gateway-0.1.3-py3-none-any.whl:

Publisher: release.yml on miaomiaoxu99/simple-ai-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page