MTP speculative decoding tuner for Qwen3.6: vLLM/SGLang config generation, crossover analysis, and bug detection.

These details have not been verified by PyPI

Project description

Qwen3.6-MTP

MTP speculative decoding tuner for Qwen3.6. Generates vLLM/SGLang configs, finds throughput crossover points, and catches known bugs.

What It Does

Configuration advisor: Recommends MTP on/off with parameters via a decision tree over use case, objective, and GPU
Backend configs: Generates vLLM (method: mtp) and SGLang (NEXTN algorithm) serve commands
Crossover analysis: Finds the batch size where MTP flips from net-positive to net-negative throughput
Bug detection: Detects and blocks known-broken configurations (TurboQuant + MTP, prefix cache degradation)
Benchmark sweep: Generate latency/throughput matrices across batch size, speculative tokens, and prefix cache settings

Installation

pip install qwen3.6-mtp

Quick Start

from qwen3_6_mtp import recommend, UseCase, Objective, Quantization

rec = recommend(
    use_case=UseCase.SINGLE_USER,
    objective=Objective.MINIMIZE_LATENCY,
    gpu_id="rtx-4090",
    quantization=Quantization.INT4,
)

print(rec.enable)           # True
print(rec.expected_gain)    # ~25-35% latency reduction (projected)
print(rec.vllm_command)     # Full vllm serve command with MTP flags
print(rec.sglang_command)   # Equivalent SGLang command

Crossover Analysis

from qwen3_6_mtp import quick_crossover

for s in quick_crossover(gpu_id="rtx-3090"):
    print(f"MTP-{s.spec_tokens}: crossover at batch {s.crossover_batch_size}, "
          f"best gain +{s.max_positive_delta_pct}%")

Backend Config Generation

from qwen3_6_mtp import vllm_mtp_command, sglang_mtp_command

vllm = vllm_mtp_command(model="Qwen/Qwen3.6-27B", num_speculative_tokens=2)
print(vllm.command)

sglang = sglang_mtp_command(model="Qwen/Qwen3.6-27B", num_speculative_tokens=2)
print(sglang.command)

Bug Detection

from qwen3_6_mtp import check_turboquant_conflict, check_prefix_cache_degradation

bug = check_turboquant_conflict(enable_turboquant=True, num_spec_tokens=2)
if bug:
    print(f"BLOCKED: {bug.title} ({bug.upstream_issue})")

Key Findings

Finding	Detail
MTP decode speedup	+27.5% faster decode TPOT at k=1 on RTX 3090 (with `--no-enable-prefix-caching`)
Prefix cache degradation	L457 bug drops hit rate ~92% to ~71% when MTP is enabled (vLLM #38182, OPEN)
TurboQuant conflict	TQ + MTP = degenerate token loops (vLLM #40831, CLOSED)
Crossover point	MTP throughput gain shrinks with batch size; net-negative varies by spec tokens and prefix cache (see `quick_crossover()`)
Sampling independence	MTP is algorithmically lossless; does not constrain sampling parameters

Supported Models

Model	Architecture	MTP Layers	Context
Qwen3.6-27B	Dense (GDN + Gated Attention)	1	262K
Qwen3.6-35B-A3B	MoE (GDN + Gated Attention)	1	262K

License

Apache 2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Apr 30, 2026

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen3_6_mtp-0.1.1.tar.gz (19.7 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwen3_6_mtp-0.1.1-py3-none-any.whl (20.2 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file qwen3_6_mtp-0.1.1.tar.gz.

File metadata

Download URL: qwen3_6_mtp-0.1.1.tar.gz
Upload date: Apr 30, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen3_6_mtp-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`cddd38b55c17809cea8a3d113584c01fd4dc7fda2debf3f4c330f01cebd78893`
MD5	`b9b1f7c00d6a9f5cb46df33d082d174a`
BLAKE2b-256	`043593186e5e35a745beba289955ffdb715d6064b0dc3456dc40cfe9a149ee26`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen3_6_mtp-0.1.1.tar.gz:

Publisher: publish.yml on ArkaD171717/Qwen3.6-MTP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen3_6_mtp-0.1.1.tar.gz
- Subject digest: cddd38b55c17809cea8a3d113584c01fd4dc7fda2debf3f4c330f01cebd78893
- Sigstore transparency entry: 1406384523
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: ArkaD171717/Qwen3.6-MTP@0ea473ade416ad6015dc6cb304df798327d9331c
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0ea473ade416ad6015dc6cb304df798327d9331c
- Trigger Event: push

File details

Details for the file qwen3_6_mtp-0.1.1-py3-none-any.whl.

File metadata

Download URL: qwen3_6_mtp-0.1.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qwen3_6_mtp-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d482b098ae5b61c91a265e661bc48826ad4583afaa0789b092c3d493819b3d4d`
MD5	`d204347b5141c1f274250a88afa97a52`
BLAKE2b-256	`5e19aaf223ec14288394b63fb931df6b66ff3207f3e7b5e3d319d96dc20868b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen3_6_mtp-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ArkaD171717/Qwen3.6-MTP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen3_6_mtp-0.1.1-py3-none-any.whl
- Subject digest: d482b098ae5b61c91a265e661bc48826ad4583afaa0789b092c3d493819b3d4d
- Sigstore transparency entry: 1406384563
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: ArkaD171717/Qwen3.6-MTP@0ea473ade416ad6015dc6cb304df798327d9331c
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ArkaD171717
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0ea473ade416ad6015dc6cb304df798327d9331c
- Trigger Event: push

qwen3.6-mtp 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Qwen3.6-MTP

What It Does

Installation

Quick Start

Crossover Analysis

Backend Config Generation

Bug Detection

Key Findings

Supported Models

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance