Skip to main content

The canonical Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

Project description

qwen-think

The Thinking Session Manager for Qwen3.6 — SDK + Router + Bug Fixes

The canonical Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

Colab Notebook — end-to-end test harness (unit tests + live GPU tests)

What It Solves

Three problems nobody has solved yet:

  1. Backend Normalization — Qwen3.6's enable_thinking flag has three different invocation patterns across backends. This library normalizes them into a single API.

  2. Atomic Sampling Swap — Qwen3.6 requires different sampling parameters for thinking vs. non-thinking mode. A router that flips enable_thinking without also swapping params produces silently degraded output.

  3. Context Budget Guard — Qwen3.6 advises maintaining at least 128K tokens of context to preserve thinking capabilities. This library tracks and guards against silent degradation.

Installation

pip install qwen-think
# With OpenAI client support:
pip install qwen-think[openai]

Quick Start

from openai import OpenAI
from qwen_think import ThinkingSession

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
session = ThinkingSession(client, backend="vllm", budget=200_000)

# Auto-routes: detects complexity, sets mode, swaps sampling
response = session.chat("refactor this module", preserve=True)

The Three Backend Patterns

Backend Flag Format Notes
vLLM / SGLang extra_body={"chat_template_kwargs": {"enable_thinking": False}} Nested
DashScope extra_body={"enable_thinking": False} Top-level
llama.cpp --chat-template-kwargs '{"enable_thinking": false}' Server-side only

Bug Fixes Included

vLLM Semantic Router (#858)

When use_reasoning: false is configured, the router removes the field instead of explicitly setting enable_thinking: false. Since Qwen3.6 thinks by default, removing the field has no effect.

Fix: This library always explicitly sets the boolean — never omits it.

Ray Serve (#52979)

enable_thinking: false in the HTTP body doesn't propagate to the model, and thinking output continues appearing.

Fix: The normalized payload always includes the explicit flag.

Sampling Parameters

Qwen3.6 requires different sampling params depending on thinking mode:

Thinking mode:

temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0

Instruct / non-thinking mode:

temperature=0.7, top_p=0.80, top_k=20, min_p=0.0,
presence_penalty=1.5

A router that flips enable_thinking without atomically swapping these params produces silently degraded output — not incorrect, just suboptimal in ways that don't surface as errors.

Context Budget

Qwen3.6 advises maintaining a context length of at least 128K tokens to preserve thinking capabilities. The BudgetManager tracks usage and:

  • Warns when approaching the threshold
  • Auto-compresses older messages when running low
  • Refuses to continue when below the minimum
session = ThinkingSession(client, budget=200_000, min_context=128_000)
status = session.budget_status
# BudgetStatus(total_tokens=200000, used_tokens=45000, available_tokens=155000,
#              action=BudgetAction.OK, message="Context usage: 22.5%...")

Thinking Preservation

Qwen3.6 introduces preserve_thinking — a feature that retains thinking context across conversation history, improving reasoning quality for iterative development.

session = ThinkingSession(client, preserve_thinking=True)
# Thinking content is cached and included in subsequent turns

Complexity Router

The router classifies query complexity and selects the appropriate mode:

Complexity Mode Preserved Use Case
SIMPLE NO_THINK No "What is X?"
MODERATE THINK No Multi-sentence reasoning
COMPLEX THINK + preserve Yes Coding, debugging, refactoring
AGENTIC THINK + preserve + budget Yes Multi-step workflows
from qwen_think import ComplexityRouter

router = ComplexityRouter()
decision = router.route("implement a REST API with authentication")
# RouterDecision(complexity=COMPLEX, mode=THINK, preserve_thinking=True, ...)

API Reference

ThinkingSession

session = ThinkingSession(
    client,                # OpenAI-compatible client
    backend="vllm",       # or "sglang", "dashscope", "llamacpp"
    model="Qwen/Qwen3.6-35B-A3B",
    budget=200_000,        # Total context budget
    min_context=128_000,   # 128K minimum for thinking
    preserve_thinking=True,
    auto_route=True,       # Auto-classify complexity
)

Manual Mode Control

# Force a specific mode
session.chat("quick answer", mode=ThinkingMode.NO_THINK)

# Let the router decide
session.chat("refactor this module")

# Check current state
session.thinking_mode  # → ThinkingMode.THINK
session.budget_status  # → BudgetStatus(...)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_think-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwen_think-0.1.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file qwen_think-0.1.0.tar.gz.

File metadata

  • Download URL: qwen_think-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3691ea350ae085713b7b8e96022e2b94e34e193099c52c2303bba03d889a5676
MD5 f79013a24605df85df0b0bb8131e97e2
BLAKE2b-256 32d82e66e1aa627c456264776dc7cb2fde812733868da8b526d5984a3f594954

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.0.tar.gz:

Publisher: publish.yml on ArkaD171717/qwen-think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qwen_think-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qwen_think-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a98fde78c14b40f8dec71c999e6c94667cba73a05d177ca0e3b2922db02d6d3
MD5 1c659cf2f994616ba90a825a83ce4754
BLAKE2b-256 613b93235c1ac4b4be28b49b48cc3739e6f72f4ac16b68dea79911e6d213a852

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/qwen-think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page