Skip to main content

The canonical Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

Project description

qwen-think

The Thinking Session Manager for Qwen3.6 — SDK + Router + Bug Fixes

The canonical Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

Colab Notebook — end-to-end test harness (unit tests + live GPU tests)

What It Solves

  1. Backend Normalization — Qwen3.6's enable_thinking flag has three different invocation patterns across backends. This library normalizes them into a single API.

  2. Sampling Parameter Swap — Qwen3.6 requires different sampling parameters for thinking vs. non-thinking mode. A router that flips enable_thinking without also swapping params produces silently degraded output.

  3. Context Budget Guard — Qwen3.6 advises maintaining at least 128K tokens of context to preserve thinking capabilities. This library tracks and guards against silent degradation.

Installation

pip install qwen-think
# With OpenAI client support:
pip install qwen-think[openai]

Quick Start

from openai import OpenAI
from qwen_think import ThinkingSession

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
session = ThinkingSession(client, backend="vllm", budget=200_000)

# Auto-routes: detects complexity, sets mode, swaps sampling
response = session.chat("refactor this module", preserve=True)

The Three Backend Patterns

Backend Flag Format Notes
vLLM / SGLang extra_body={"chat_template_kwargs": {"enable_thinking": False}} Nested
DashScope extra_body={"enable_thinking": False} Top-level
llama.cpp --chat-template-kwargs '{"enable_thinking": false}' Server-side only

Bug Fixes Included

vLLM Semantic Router (#858)

When use_reasoning: false is configured, the router removes the field instead of explicitly setting enable_thinking: false. Since Qwen3.6 thinks by default, removing the field has no effect.

Fix: This library always explicitly sets the boolean — never omits it.

Ray Serve (#52979)

enable_thinking: false in the HTTP body doesn't propagate to the model, and thinking output continues appearing.

Fix: The normalized payload always includes the explicit flag.

Sampling Parameters

Qwen3.6 requires different sampling params depending on thinking mode:

Thinking mode:

temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0

Instruct / non-thinking mode:

temperature=0.7, top_p=0.80, top_k=20, min_p=0.0,
presence_penalty=1.5

A router that flips enable_thinking without atomically swapping these params produces silently degraded output — not incorrect, just suboptimal in ways that don't surface as errors.

Context Budget

Qwen3.6 advises maintaining a context length of at least 128K tokens to preserve thinking capabilities. The BudgetManager tracks usage and:

  • Warns when approaching the threshold
  • Auto-compresses older messages when running low
  • Refuses to continue when below the minimum
session = ThinkingSession(client, budget=200_000, min_context=128_000)
status = session.budget_status
# BudgetStatus(total_tokens=200000, used_tokens=45000, available_tokens=155000,
#              action=BudgetAction.OK, message="Context usage: 22.5%...")

Thinking Preservation

Qwen3.6 introduces preserve_thinking — a feature that retains thinking context across conversation history, improving reasoning quality for iterative development.

session = ThinkingSession(client, preserve_thinking=True)
# Thinking content is cached and included in subsequent turns

Complexity Router

The router classifies query complexity and selects the appropriate mode:

Complexity Mode Preserved Use Case
SIMPLE NO_THINK No "What is X?"
MODERATE THINK No Multi-sentence reasoning
COMPLEX THINK + preserve Yes Coding, debugging, refactoring
from qwen_think import ComplexityRouter

router = ComplexityRouter()
decision = router.route("implement a REST API with authentication")
# RouterDecision(complexity=COMPLEX, mode=THINK, preserve_thinking=True)

API Reference

ThinkingSession

session = ThinkingSession(
    client,                # OpenAI-compatible client
    backend="vllm",       # or "sglang", "dashscope", "llamacpp"
    model="Qwen/Qwen3.6-35B-A3B",
    budget=200_000,        # Total context budget
    min_context=128_000,   # 128K minimum for thinking
    preserve_thinking=True,
    auto_route=True,       # Auto-classify complexity
)

Manual Mode Control

# Force a specific mode
session.chat("quick answer", mode=ThinkingMode.NO_THINK)

# Let the router decide
session.chat("refactor this module")

# Check current state
session.thinking_mode  # → ThinkingMode.THINK
session.budget_status  # → BudgetStatus(...)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_think-0.1.1.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwen_think-0.1.1-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file qwen_think-0.1.1.tar.gz.

File metadata

  • Download URL: qwen_think-0.1.1.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d56b92b86e0ecf74c2d797009c11a821b96ed96b4a5b4eed76d8e1e30c24df63
MD5 3d8182880982a605f117364e587e9f7a
BLAKE2b-256 d01e0f67341c509244305d8c03b207b0f0d32e5263289b992af02db807b8be20

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.1.tar.gz:

Publisher: publish.yml on ArkaD171717/qwen-think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qwen_think-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: qwen_think-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef0909e81b80f6b91acf5b6f23c8ce29f856f25340d8222dd937d59e5572457c
MD5 cc6cd9e0aa909b5ce8a010310b9aebb2
BLAKE2b-256 633f3892a8e50f0dfd81271294768f1c2fd05f5be1bb97813fd7d72172d28ca7

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/qwen-think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page