Skip to main content

Thinking session manager for Qwen3.6: backend normalization, sampling parameter swap, and 128K context budget guard.

Project description

Qwen3-Think

The Thinking Session Manager for Qwen3.6 (or any Qwen3+ model that uses enable_thinking) -- SDK + Router + Bug Fixes

Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

What It Solves

  1. Backend Normalization -- Qwen3.6's enable_thinking flag has three different invocation patterns across backends. This library normalizes them into a single API.

  2. Sampling Parameter Swap -- Qwen3.6 requires different sampling parameters for thinking vs. non-thinking mode. A router that flips enable_thinking without also swapping params produces silently degraded output.

  3. Context Budget Guard -- Qwen3.6 advises maintaining at least 128K tokens of context to preserve thinking capabilities. This library tracks and guards against silent degradation.

Installation

pip install qwen-think
# With OpenAI client support:
pip install qwen-think[openai]

Quick Start

from openai import OpenAI
from qwen_think import ThinkingSession

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
session = ThinkingSession(client, backend="vllm", budget=200_000)

# Auto-routes: detects complexity, sets mode, swaps sampling
response = session.chat("refactor this module", preserve=True)

The Three Backend Patterns

Backend Flag Format Notes
vLLM / SGLang extra_body={"chat_template_kwargs": {"enable_thinking": False}} Nested
DashScope extra_body={"enable_thinking": False} Top-level
llama.cpp --chat-template-kwargs '{"enable_thinking": false}' Server-side only

Bug Fixes Included

vLLM Semantic Router (#858)

When use_reasoning: false is configured, the router removes the field instead of explicitly setting enable_thinking: false. Since Qwen3.6 thinks by default, removing the field has no effect.

Fix: This library always explicitly sets the boolean -- never omits it.

Ray Serve (#52979)

enable_thinking: false in the HTTP body doesn't propagate to the model, and thinking output continues appearing.

Fix: The normalized payload always includes the explicit flag.

Sampling Parameters

Qwen3.6 requires different sampling params depending on thinking mode:

Thinking mode:

temperature=0.6, top_p=0.95, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0

Instruct / non-thinking mode:

temperature=0.7, top_p=0.80, top_k=20, min_p=0.0,
presence_penalty=1.5

A router that flips enable_thinking without atomically swapping these params produces silently degraded output -- not incorrect, just suboptimal in ways that don't surface as errors.

Context Budget

Qwen3.6 advises maintaining a context length of at least 128K tokens to preserve thinking capabilities. The BudgetManager tracks usage and:

  • Warns when approaching the threshold
  • Auto-compresses older messages when running low
  • Refuses to continue when below the minimum
session = ThinkingSession(client, budget=200_000, min_context=128_000)
status = session.budget_status
# BudgetStatus(total_tokens=200000, used_tokens=10000, available_tokens=190000,
#              min_context=128000, action=BudgetAction.OK,
#              message="Available: 190,000 of 200,000 tokens.")

Thinking Preservation

Qwen3.6 introduces preserve_thinking -- a feature that retains thinking context across conversation history, improving reasoning quality for iterative development.

session = ThinkingSession(client, preserve_thinking=True)
# Thinking content is cached and included in subsequent turns

Complexity Router

The router classifies query complexity and selects the appropriate mode:

Complexity Mode Preserved Use Case
SIMPLE NO_THINK No "What is X?"
MODERATE THINK No Multi-sentence reasoning
COMPLEX THINK + preserve Yes Coding, debugging, refactoring
from qwen_think import ComplexityRouter

router = ComplexityRouter()
decision = router.route("implement a REST API with authentication")
# RouterDecision(complexity=COMPLEX, mode=THINK, preserve_thinking=True)

API Reference

ThinkingSession

session = ThinkingSession(
    client,                # OpenAI-compatible client
    backend="vllm",       # or "sglang", "dashscope", "llamacpp"
    model="Qwen/Qwen3.6-35B-A3B",
    budget=200_000,        # Total context budget
    min_context=128_000,   # 128K minimum for thinking
    preserve_thinking=True,
    auto_route=True,       # Auto-classify complexity
)

Manual Mode Control

# Force a specific mode
session.chat("quick answer", mode=ThinkingMode.NO_THINK)

# Let the router decide
session.chat("refactor this module")

# Check current state
session.thinking_mode  # -> ThinkingMode.THINK
session.budget_status  # -> BudgetStatus(...)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_think-0.1.2.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwen_think-0.1.2-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file qwen_think-0.1.2.tar.gz.

File metadata

  • Download URL: qwen_think-0.1.2.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d97decd2bd5d02ee9ca0725ba5f050b8d19fa6c82d2aedc9195f24d4c43b2f89
MD5 e77b0901f01ae6207ccf60b21bf8e48c
BLAKE2b-256 55d6cc8cb4c55a42aa6d5b460f314a2324d9634ed663b4d19bfb4972f727603c

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.2.tar.gz:

Publisher: publish.yml on ArkaD171717/Qwen3-Think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qwen_think-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: qwen_think-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb471265c3cdbe69d4dae67788a5c8b109aff97c5a5d2ef2854d7f42b59b6569
MD5 84c47d361aa3fb338f8e4eef78a84f3a
BLAKE2b-256 2474f6062a14dda1aec650b3d16c1fc1e81a25756e7d5d724725426af6faa7de

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/Qwen3-Think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page