Small CLI playground for Qwen via an OpenAI-compatible LLM Forge gateway.
Project description
LLM Forge Playground (Qwen via OpenAI-compatible gateway)
A small interactive Python CLI for talking to a Qwen model behind an OpenAI-compatible LLM Forge gateway. Responses stream into the console by default. You change model, temperature, streaming, and other settings on the fly—no restart needed.
- Dynamic config — Change any setting in-session with
set <key> <value>. - Streaming by default — Tokens print as they arrive.
- Discoverable — Type
?orhelpfor all commands. - Scenarios — Run predefined tests anytime with
scenario 1…scenario 4.
Quickstart
Using uv (recommended):
uv tool install llm-forge-playground
# or in a project:
uv add llm-forge-playground
llm-forge-cli
Using pipenv:
pipenv install llm-forge-playground
pipenv run llm-forge-cli
Using pip:
pip install llm-forge-playground
llm-forge-cli
Set your gateway token (and optional base URL), then type a message. Use ? for in-session commands.
export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"
export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1" # optional
llm-forge-cli
Environment configuration
Set these before starting the CLI:
LLM_FORGE_BASE_URL: Gateway base URL (default:http://localhost:8000/v1).LLM_FORGE_ACCESS_TOKEN: JWT / bearer token (recommended).
Or use basic-auth login (client obtains a token once):
LLM_FORGE_AUTH_URL,LLM_FORGE_USERNAME,LLM_FORGE_PASSWORD
Example:
export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1"
export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"
If neither a token nor auth credentials are set, the program exits with a clear error.
Installation (from source)
git clone https://github.com/the-aic-project/openai-playground.git
cd openai-playground
uv sync # or: pipenv install -e . / pip install -e .
llm-forge-cli
Running the CLI
llm-forge-cli
or:
python -m llm_forge_playground.cli
You enter an interactive REPL. The prompt shows [stream] or [sync] so you know the current mode.
- Type a message — It’s sent to the model; the reply streams (or prints when sync).
- Change settings anytime — No restart. Use the commands below.
In-session commands
| Input | Description |
|---|---|
? or help |
List all commands and set keys |
config |
Show current model, temp, stream, thinking, max_tokens, etc. |
set <key> <value> |
Change a setting (e.g. set stream off, set temp 0.3, set model other/model) |
clear |
Clear conversation history; start a new thread |
scenario 1 … scenario 4 |
Run a predefined test (see below) |
exit, quit, q |
Exit |
You can use / or not: set stream on and /set stream on both work.
Set keys and aliases
- model (alias: m) — Model name.
- temp (alias: t) — Temperature, e.g.
0.7. - top_p — Top-p sampling.
- max_tokens (alias: mt) — Max tokens to generate.
- top_k — Extra body
top_k; usenoneto clear. - thinking —
on/off(client-side flag; may have no effect on vLLM Qwen). - stream (alias: s) —
on/off(stream responses or wait for full reply).
Examples:
set stream off
set temp 0.2
set model Qwen/Qwen3.5-397B-A17B-FP8
set thinking on
Predefined scenarios (run anytime)
- scenario 1 — Short explanation, no thinking, no streaming.
- scenario 2 — Same question with thinking (non-streaming).
- scenario 3 — Same question, streaming, no thinking.
- scenario 4 — Longer reasoning task, thinking, higher max_tokens.
Each scenario sends one user message and appends the reply to the current conversation.
Scripting: run one scenario and exit
llm-forge-cli --scenario 2
Runs scenario 2 and exits (no REPL).
How thinking and streaming work
- Thinking — The client sends
extra_body["chat_template_kwargs"]["enable_thinking"]. On many vLLM Qwen gateways, whether reasoning tokens are returned is decided server-side (parser/config), not by this flag. - Streaming — When
streamis on, the CLI useschat.completions.create(stream=True, ...)and prints each content delta as it arrives; the full reply is still stored in conversation history.
Publishing to PyPI (maintainers)
The Release workflow (tag → release) uses GitHub’s built-in GITHUB_TOKEN—no secret to add. The Publish to PyPI workflow runs when a release is published; to enable it, add a PyPI API token as a GitHub secret:
-
Create a PyPI token
- Log in at pypi.org, go to Account settings → API tokens.
- Click Add API token. Name it (e.g.
github-actions) and set scope to the project (e.g. Project: llm-forge-playground) or “Entire account” if you prefer. - Copy the token; it starts with
pypi-and is shown only once.
-
Add the token to GitHub
- In your repo: Settings → Secrets and variables → Actions.
- Click New repository secret. Name:
PYPI_API_TOKEN, Value: paste thepypi-...token. Save.
-
Publish
- Option A: Push a version tag; the Release workflow creates a GitHub Release (with generated notes), which triggers Publish to PyPI:
git tag v0.1.0 git push origin v0.1.0
- Option B: Create a release from the GitHub Releases UI (e.g. “Draft new release” → choose tag or create one → “Publish release”).
- Option C: Run the Publish to PyPI workflow manually from the Actions tab.
- Option A: Push a version tag; the Release workflow creates a GitHub Release (with generated notes), which triggers Publish to PyPI:
Example code (project abstractions)
from llm_forge_playground.config import AppConfig
from llm_forge_playground.client import build_openai_client, chat_once
config = AppConfig.from_env(
stream=True, # default
enable_thinking=False,
)
client = build_openai_client(config)
messages = [{"role": "user", "content": "Explain what a knowledge graph is in two sentences."}]
reply = chat_once(client, config, messages)
You can change config.model, config.temperature, etc. between calls; no need to recreate the client unless you change base URL or auth.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_forge_playground-0.1.0.tar.gz.
File metadata
- Download URL: llm_forge_playground-0.1.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71d12a28a221582fc2651cbb500a0248f8f916b196b64f782c7f53abaa635c43
|
|
| MD5 |
bb8acf04a2ca9f5f4dc39831f747a0e2
|
|
| BLAKE2b-256 |
89ea4dbc68be94d2052aa1c0eefd516184e275f056a725615cea11076f6bbeae
|
File details
Details for the file llm_forge_playground-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_forge_playground-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61d4531227d0c1a04ab0268a7e660be98a53f037b2c6366ae9ad785f953ae103
|
|
| MD5 |
f0efb2dea4ebe82a735e434d70a03278
|
|
| BLAKE2b-256 |
2f2b7a9730ada37a20a97f6d3d6e9505a47aa4962f0257806354182c8af11605
|