Thinking session manager for Qwen3.6: backend normalization, sampling parameter swap, and 128K context budget guard.

These details have not been verified by PyPI

Project description

Qwen3-Think

The Thinking Session Manager for Qwen3.6 (or any Qwen3+ model that uses enable_thinking) -- SDK + Router + Bug Fixes

Python library for managing Qwen3.6's thinking state across sessions, backends, and frameworks.

What It Solves

Backend Normalization -- Qwen3.6's enable_thinking flag has three different invocation patterns across backends. This library normalizes them into a single API.
Sampling Parameter Swap -- Qwen3.6 requires different sampling parameters for thinking vs. non-thinking mode. A router that flips enable_thinking without also swapping params produces silently degraded output.
Context Budget Guard -- Qwen3.6 advises maintaining at least 128K tokens of context to preserve thinking capabilities. This library tracks and guards against silent degradation.

Installation

pip install qwen-think
# With OpenAI client support:
pip install qwen-think[openai]

Quick Start

from openai import OpenAI
from qwen_think import ThinkingSession

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
session = ThinkingSession(client, backend="vllm", budget=200_000)

# Auto-routes: detects complexity, sets mode, swaps sampling
response = session.chat("refactor this module", preserve=True)

The Three Backend Patterns

Backend	Flag Format	Notes
vLLM / SGLang	`extra_body={"chat_template_kwargs": {"enable_thinking": False}}`	Nested
DashScope	`extra_body={"enable_thinking": False}`	Top-level
llama.cpp	`--chat-template-kwargs '{"enable_thinking": false}'`	Server-side only

Bug Fixes Included

vLLM Semantic Router (#858)

When use_reasoning: false is configured, the router removes the field instead of explicitly setting enable_thinking: false. Since Qwen3.6 thinks by default, removing the field has no effect.

Fix: This library always explicitly sets the boolean -- never omits it.

Ray Serve (#52979)

enable_thinking: false in the HTTP body doesn't propagate to the model, and thinking output continues appearing.

Fix: The normalized payload always includes the explicit flag.

Sampling Parameters

Qwen3.6 requires different sampling params depending on thinking mode:

Thinking mode:

temperature=0.6, top_p=0.95, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0

Instruct / non-thinking mode:

temperature=0.7, top_p=0.80, top_k=20, min_p=0.0,
presence_penalty=1.5

A router that flips enable_thinking without atomically swapping these params produces silently degraded output -- not incorrect, just suboptimal in ways that don't surface as errors.

Context Budget

Qwen3.6 advises maintaining a context length of at least 128K tokens to preserve thinking capabilities. The BudgetManager tracks usage and:

Warns when approaching the threshold
Auto-compresses older messages when running low
Refuses to continue when below the minimum

session = ThinkingSession(client, budget=200_000, min_context=128_000)
status = session.budget_status
# BudgetStatus(total_tokens=200000, used_tokens=10000, available_tokens=190000,
#              min_context=128000, action=BudgetAction.OK,
#              message="Available: 190,000 of 200,000 tokens.")

Thinking Preservation

Qwen3.6 introduces preserve_thinking -- a feature that retains thinking context across conversation history, improving reasoning quality for iterative development.

session = ThinkingSession(client, preserve_thinking=True)
# Thinking content is cached and included in subsequent turns

Complexity Router

The router classifies query complexity and selects the appropriate mode:

Complexity	Mode	Preserved	Use Case
SIMPLE	NO_THINK	No	"What is X?"
MODERATE	THINK	No	Multi-sentence reasoning
COMPLEX	THINK + preserve	Yes	Coding, debugging, refactoring

from qwen_think import ComplexityRouter

router = ComplexityRouter()
decision = router.route("implement a REST API with authentication")
# RouterDecision(complexity=COMPLEX, mode=THINK, preserve_thinking=True)

API Reference

ThinkingSession

session = ThinkingSession(
    client,                # OpenAI-compatible client
    backend="vllm",       # or "sglang", "dashscope", "llamacpp"
    model="Qwen/Qwen3.6-35B-A3B",
    budget=200_000,        # Total context budget
    min_context=128_000,   # 128K minimum for thinking
    preserve_thinking=True,
    auto_route=True,       # Auto-classify complexity
)

Manual Mode Control

# Force a specific mode
session.chat("quick answer", mode=ThinkingMode.NO_THINK)

# Let the router decide
session.chat("refactor this module")

# Check current state
session.thinking_mode  # -> ThinkingMode.THINK
session.budget_status  # -> BudgetStatus(...)

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Apr 30, 2026

0.1.1

Apr 28, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_think-0.1.2.tar.gz (25.2 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwen_think-0.1.2-py3-none-any.whl (23.9 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file qwen_think-0.1.2.tar.gz.

File metadata

Download URL: qwen_think-0.1.2.tar.gz
Upload date: Apr 30, 2026
Size: 25.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`d97decd2bd5d02ee9ca0725ba5f050b8d19fa6c82d2aedc9195f24d4c43b2f89`
MD5	`e77b0901f01ae6207ccf60b21bf8e48c`
BLAKE2b-256	`55d6cc8cb4c55a42aa6d5b460f314a2324d9634ed663b4d19bfb4972f727603c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.2.tar.gz:

Publisher: publish.yml on ArkaD171717/Qwen3-Think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen_think-0.1.2.tar.gz
- Subject digest: d97decd2bd5d02ee9ca0725ba5f050b8d19fa6c82d2aedc9195f24d4c43b2f89
- Sigstore transparency entry: 1406423010
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: ArkaD171717/Qwen3-Think@801d10d996b1c0e20011dbef5433599cade5e87d
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@801d10d996b1c0e20011dbef5433599cade5e87d
- Trigger Event: release

File details

Details for the file qwen_think-0.1.2-py3-none-any.whl.

File metadata

Download URL: qwen_think-0.1.2-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen_think-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb471265c3cdbe69d4dae67788a5c8b109aff97c5a5d2ef2854d7f42b59b6569`
MD5	`84c47d361aa3fb338f8e4eef78a84f3a`
BLAKE2b-256	`2474f6062a14dda1aec650b3d16c1fc1e81a25756e7d5d724725426af6faa7de`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_think-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/Qwen3-Think

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen_think-0.1.2-py3-none-any.whl
- Subject digest: fb471265c3cdbe69d4dae67788a5c8b109aff97c5a5d2ef2854d7f42b59b6569
- Sigstore transparency entry: 1406423072
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: ArkaD171717/Qwen3-Think@801d10d996b1c0e20011dbef5433599cade5e87d
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@801d10d996b1c0e20011dbef5433599cade5e87d
- Trigger Event: release

qwen-think 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Qwen3-Think

What It Solves

Installation

Quick Start

The Three Backend Patterns

Bug Fixes Included

vLLM Semantic Router (#858)

Ray Serve (#52979)

Sampling Parameters

Context Budget

Thinking Preservation

Complexity Router

API Reference

ThinkingSession

Manual Mode Control

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance