Skip to main content

Small CLI playground for Qwen via an OpenAI-compatible LLM Forge gateway.

Project description

LLM Forge Playground (Qwen via OpenAI-compatible gateway)

CI Lint PyPI version Python 3.9+ License: MIT

A small interactive Python CLI for talking to a Qwen model behind an OpenAI-compatible LLM Forge gateway. Responses stream into the console by default. You change model, temperature, streaming, and other settings on the fly—no restart needed.

  • Dynamic config — Change any setting in-session with set <key> <value>.
  • Streaming by default — Tokens print as they arrive.
  • Discoverable — Type ? or help for all commands.
  • Scenarios — Run predefined tests anytime with scenario 1scenario 4.

Quickstart

Using uv (recommended):

uv tool install llm-forge-playground
# or in a project:
uv add llm-forge-playground
llm-forge-cli

Using pipenv:

pipenv install llm-forge-playground
pipenv run llm-forge-cli

Using pip:

pip install llm-forge-playground
llm-forge-cli

Set your gateway token (and optional base URL), then type a message. Use ? for in-session commands.

export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"
export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1"  # optional
llm-forge-cli

Environment configuration

Set these before starting the CLI:

  • LLM_FORGE_BASE_URL: Gateway base URL (default: http://localhost:8000/v1).
  • LLM_FORGE_ACCESS_TOKEN: JWT / bearer token (recommended).

Or use basic-auth login (client obtains a token once):

  • LLM_FORGE_AUTH_URL, LLM_FORGE_USERNAME, LLM_FORGE_PASSWORD

Example:

export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1"
export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"

If neither a token nor auth credentials are set, the program exits with a clear error.


Installation (from source)

git clone https://github.com/the-aic-project/openai-playground.git
cd openai-playground
uv sync   # or: pipenv install -e .  /  pip install -e .
llm-forge-cli

Running the CLI

llm-forge-cli

or:

python -m llm_forge_playground.cli

You enter an interactive REPL. The prompt shows [stream] or [sync] so you know the current mode.

  • Type a message — It’s sent to the model; the reply streams (or prints when sync).
  • Change settings anytime — No restart. Use the commands below.

In-session commands

Input Description
? or help List all commands and set keys
config Show current model, temp, stream, thinking, max_tokens, etc.
set <key> <value> Change a setting (e.g. set stream off, set temp 0.3, set model other/model)
clear Clear conversation history; start a new thread
scenario 1scenario 4 Run a predefined test (see below)
exit, quit, q Exit

You can use / or not: set stream on and /set stream on both work.

Set keys and aliases

  • model (alias: m) — Model name.
  • temp (alias: t) — Temperature, e.g. 0.7.
  • top_p — Top-p sampling.
  • max_tokens (alias: mt) — Max tokens to generate.
  • top_k — Extra body top_k; use none to clear.
  • thinkingon / off (client-side flag; may have no effect on vLLM Qwen).
  • stream (alias: s) — on / off (stream responses or wait for full reply).

Examples:

set stream off
set temp 0.2
set model Qwen/Qwen3.5-397B-A17B-FP8
set thinking on

Predefined scenarios (run anytime)

  • scenario 1 — Short explanation, no thinking, no streaming.
  • scenario 2 — Same question with thinking (non-streaming).
  • scenario 3 — Same question, streaming, no thinking.
  • scenario 4 — Longer reasoning task, thinking, higher max_tokens.

Each scenario sends one user message and appends the reply to the current conversation.

Scripting: run one scenario and exit

llm-forge-cli --scenario 2

Runs scenario 2 and exits (no REPL).


How thinking and streaming work

  • Thinking — The client sends extra_body["chat_template_kwargs"]["enable_thinking"]. On many vLLM Qwen gateways, whether reasoning tokens are returned is decided server-side (parser/config), not by this flag.
  • Streaming — When stream is on, the CLI uses chat.completions.create(stream=True, ...) and prints each content delta as it arrives; the full reply is still stored in conversation history.

Publishing to PyPI (maintainers)

The Release workflow (tag → release) uses GitHub’s built-in GITHUB_TOKEN—no secret to add. The Publish to PyPI workflow runs when a release is published; to enable it, add a PyPI API token as a GitHub secret:

  1. Create a PyPI token

    • Log in at pypi.org, go to Account settings → API tokens.
    • Click Add API token. Name it (e.g. github-actions) and set scope to the project (e.g. Project: llm-forge-playground) or “Entire account” if you prefer.
    • Copy the token; it starts with pypi- and is shown only once.
  2. Add the token to GitHub

    • In your repo: Settings → Secrets and variables → Actions.
    • Click New repository secret. Name: PYPI_API_TOKEN, Value: paste the pypi-... token. Save.
  3. Publish

    • Option A: Push a version tag; the Release workflow creates a GitHub Release (with generated notes), which triggers Publish to PyPI:
      git tag v0.1.0
      git push origin v0.1.0
      
    • Option B: Create a release from the GitHub Releases UI (e.g. “Draft new release” → choose tag or create one → “Publish release”).
    • Option C: Run the Publish to PyPI workflow manually from the Actions tab.

Example code (project abstractions)

from llm_forge_playground.config import AppConfig
from llm_forge_playground.client import build_openai_client, chat_once

config = AppConfig.from_env(
    stream=True,           # default
    enable_thinking=False,
)
client = build_openai_client(config)

messages = [{"role": "user", "content": "Explain what a knowledge graph is in two sentences."}]
reply = chat_once(client, config, messages)

You can change config.model, config.temperature, etc. between calls; no need to recreate the client unless you change base URL or auth.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_forge_playground-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_forge_playground-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_forge_playground-0.1.0.tar.gz.

File metadata

  • Download URL: llm_forge_playground-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llm_forge_playground-0.1.0.tar.gz
Algorithm Hash digest
SHA256 71d12a28a221582fc2651cbb500a0248f8f916b196b64f782c7f53abaa635c43
MD5 bb8acf04a2ca9f5f4dc39831f747a0e2
BLAKE2b-256 89ea4dbc68be94d2052aa1c0eefd516184e275f056a725615cea11076f6bbeae

See more details on using hashes here.

File details

Details for the file llm_forge_playground-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_forge_playground-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61d4531227d0c1a04ab0268a7e660be98a53f037b2c6366ae9ad785f953ae103
MD5 f0efb2dea4ebe82a735e434d70a03278
BLAKE2b-256 2f2b7a9730ada37a20a97f6d3d6e9505a47aa4962f0257806354182c8af11605

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page