OpenAI-compatible reasoning-aware inference proxy for Qwen3.6

These details have not been verified by PyPI

Project description

forge-cloud

OpenAI-compatible reasoning-aware inference proxy for Qwen3.6.

Point your OpenAI client at forge-cloud instead of directly at vLLM/SGLang/Ollama. The proxy routes thinking mode based on query complexity, swaps sampling parameters to match the mode, normalizes backend flags, and tags responses with routing metadata.

What it does

Receives a standard /v1/chat/completions request
Classifies query complexity (simple/moderate/complex)
Decides thinking mode (think vs no_think) with correct sampling params
Normalizes the enable_thinking flag for the target backend (vLLM nested, DashScope top-level, llama.cpp server-side)
Forwards to the user's configured backend
Tags the response with routing metadata and estimated token split (thinking vs response)

The proxy does not run inference. It configures and monitors it.

Install

pip install forge-cloud

Quick start

# Set admin key and backend URL
export FORGE_ADMIN_KEY=my-secret
export FORGE_BACKEND_URL=http://localhost:8000
export FORGE_BACKEND_TYPE=vllm

# Start the proxy
forge-cloud

Create an API key:

curl -X POST http://localhost:8741/v1/keys \
  -H "Authorization: Bearer my-secret" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app"}'
# Returns: {"key": "fk-...", "name": "my-app", "tier": "free", ...}

Use it like any OpenAI endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8741/v1",
    api_key="fk-..."  # key from above
)

response = client.chat.completions.create(
    model="Qwen/Qwen3.6-35B-A3B",
    messages=[{"role": "user", "content": "refactor this module"}],
)
print(response.choices[0].message.content)

Response metadata

Every response includes a forge field with routing metadata and estimated token counts:

{
  "id": "chatcmpl-test",
  "choices": ["..."],
  "usage": {"...": 0},
  "forge": {
    "thinking_mode": "think",
    "complexity": "complex",
    "backend": "vllm",
    "sampling_profile": "thinking",
    "thinking_tokens": 450,
    "response_tokens": 120
  }
}

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	Proxied chat completion with forge routing
POST	`/v1/keys`	Create API key (admin auth required)
GET	`/health`	Proxy health check

Configuration

All settings are environment variables with FORGE_ prefix:

Variable	Default	Description
`FORGE_HOST`	`0.0.0.0`	Bind address
`FORGE_PORT`	`8741`	Port
`FORGE_BACKEND_URL`	`http://localhost:8000`	Default backend URL
`FORGE_BACKEND_TYPE`	`vllm`	Backend type: vllm, sglang, dashscope, llamacpp
`FORGE_FREE_DAILY_LIMIT`	`1000`	Free tier requests per day
`FORGE_ADMIN_KEY`	(empty)	Admin key for creating API keys
`FORGE_DB_PATH`	`forge.db`	SQLite database path
`FORGE_REQUEST_TIMEOUT`	`120.0`	Backend request timeout (seconds)

Per-key backend override

Each API key can have its own backend URL and type:

curl -X POST http://localhost:8741/v1/keys \
  -H "Authorization: Bearer my-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "sglang-user",
    "tier": "paid",
    "backend_url": "http://sglang-server:30000",
    "backend_type": "sglang"
  }'

Tiers

Free: 1,000 requests/day, single backend target
Paid: no rate limit, per-key backend routing

Streaming

Streaming is supported. Set stream: true in the request and the proxy forwards the SSE stream from the backend.

Dependencies

qwen-think -- thinking session manager (routing, budget, sampling)
FastAPI + uvicorn
httpx -- async HTTP client for backend forwarding
aiosqlite -- async SQLite for API keys and usage tracking

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 1, 2026

0.1.1

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_infer_cloud-0.1.2.tar.gz (16.5 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

forge_infer_cloud-0.1.2-py3-none-any.whl (15.1 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file forge_infer_cloud-0.1.2.tar.gz.

File metadata

Download URL: forge_infer_cloud-0.1.2.tar.gz
Upload date: May 1, 2026
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_infer_cloud-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4f6873c7644b405fffa6aa18841988ee6dcbd5a16d29dd2d963bcb1d1f045e0d`
MD5	`0f752e43388b53870e72706115da5f4f`
BLAKE2b-256	`5b66851f45f87daba79c336d1216477fb57c86d1946af590d27aa73d97367bce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_infer_cloud-0.1.2.tar.gz:

Publisher: publish.yml on ArkaD171717/forge-cloud

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: forge_infer_cloud-0.1.2.tar.gz
- Subject digest: 4f6873c7644b405fffa6aa18841988ee6dcbd5a16d29dd2d963bcb1d1f045e0d
- Sigstore transparency entry: 1417114737
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: ArkaD171717/forge-cloud@901ae0c772c18627092fd16e70f8bef0e04cf0bb
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@901ae0c772c18627092fd16e70f8bef0e04cf0bb
- Trigger Event: push

File details

Details for the file forge_infer_cloud-0.1.2-py3-none-any.whl.

File metadata

Download URL: forge_infer_cloud-0.1.2-py3-none-any.whl
Upload date: May 1, 2026
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forge_infer_cloud-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`78a8ec0dad50adc1e5fe15f492102f4556d48d53f34dfa02b4fd63dcbb057528`
MD5	`870aac0a6449be28475fee07342a5cb4`
BLAKE2b-256	`0b8f42899cad111fbb0b2b8d78a7469cca4ce8628a7cfc471680fb8136ed8472`

See more details on using hashes here.

Provenance

The following attestation bundles were made for forge_infer_cloud-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/forge-cloud

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: forge_infer_cloud-0.1.2-py3-none-any.whl
- Subject digest: 78a8ec0dad50adc1e5fe15f492102f4556d48d53f34dfa02b4fd63dcbb057528
- Sigstore transparency entry: 1417114749
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: ArkaD171717/forge-cloud@901ae0c772c18627092fd16e70f8bef0e04cf0bb
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@901ae0c772c18627092fd16e70f8bef0e04cf0bb
- Trigger Event: push

forge-infer-cloud 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

forge-cloud

What it does

Install

Quick start

Response metadata

Endpoints

Configuration

Per-key backend override

Tiers

Streaming

Dependencies

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance