Inference control plane for reasoning-aware open-source models
Project description
forge-infer
Metapackage bundling qwen-think and qwen3.6-mtp under a shared namespace.
pip install forge-infer pulls in qwen-think and qwen3.6-mtp as dependencies and re-exports their key APIs under a single forge namespace. This is packaging and narrative, not new code.
Why this exists
Two focused packages -- thinking-mode session control and MTP speculative decoding -- that belong together. forge-infer gives them a shared identity so you can recommend, install, and document them as a unit instead of scattering links across READMEs.
Install
pip install forge-infer
This installs both qwen-think and qwen3.6-mtp automatically.
Quick start
Thinking sessions (qwen-think)
Control when and how Qwen3.6 "thinks" -- budget tokens, toggle thinking on/off mid-conversation, route by complexity.
from forge.session import ThinkingSession
session = ThinkingSession(model="Qwen/Qwen3.6-27B")
response = session.chat("Explain merge sort", thinking=True)
print(response)
MTP speculative decoding (qwen3.6-mtp)
Tune multi-token prediction for throughput, find crossover points, generate backend configs.
from forge.mtp import recommend, quick_crossover, vllm_mtp_command, sglang_mtp_command
from forge.mtp import UseCase, Objective
# Get a recommendation for your hardware
rec = recommend(use_case=UseCase.SINGLE_USER, objective=Objective.MINIMIZE_LATENCY, gpu_id="rtx-4090")
print(rec.enable, rec.expected_gain)
# Find where MTP flips from positive to negative
for s in quick_crossover(gpu_id="rtx-3090"):
print(f"MTP-{s.spec_tokens}: crossover at batch {s.crossover_batch_size}")
# Generate serve commands
print(vllm_mtp_command(model="Qwen/Qwen3.6-27B", num_speculative_tokens=2).command)
print(sglang_mtp_command(model="Qwen/Qwen3.6-27B", num_speculative_tokens=2).command)
Architecture
How the packages relate:
+---------------------------------------------+
| forge (metapackage) |
+------------------+--------------------------+
| forge.session | forge.mtp |
| (qwen-think) | (qwen3.6-mtp) |
| | |
| Thinking-mode | MTP speculative decode |
| session control | tuning & backend config |
+------------------+--------------------------+
| Qwen3.6 model family |
+---------------------------------------------+
- forge.session -- Re-exports ThinkingSession from qwen-think.
- forge.mtp -- Re-exports recommend, quick_crossover, vllm_mtp_command, sglang_mtp_command, UseCase, Objective from qwen3.6-mtp.
Individual packages
| Package | What it does |
|---|---|
| qwen-think | Thinking-mode session management |
| qwen3.6-mtp | MTP speculative decoding tuner |
What this package does NOT do
- No new functionality -- strictly re-exports from the underlying packages
- No CLI -- the libraries are Python-first
- No model generalization -- wraps Qwen3.6-specific versions as-is
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forge_infer-0.2.1.tar.gz.
File metadata
- Download URL: forge_infer-0.2.1.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53bb43ffc6238526922deac2fe102fe8eda280793c10d32fbfa8b9b4eb50b3d1
|
|
| MD5 |
b686a1ea85e13c5f721ba748024e028d
|
|
| BLAKE2b-256 |
665ea65b1dc64b95a7deb6d23dd54d9309c68309f409e3915aa78f03c817163b
|
Provenance
The following attestation bundles were made for forge_infer-0.2.1.tar.gz:
Publisher:
publish.yml on ArkaD171717/FORGE-Infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forge_infer-0.2.1.tar.gz -
Subject digest:
53bb43ffc6238526922deac2fe102fe8eda280793c10d32fbfa8b9b4eb50b3d1 - Sigstore transparency entry: 1417200514
- Sigstore integration time:
-
Permalink:
ArkaD171717/FORGE-Infer@354834a4749248d195600041fad49af9ebea353f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ArkaD171717
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@354834a4749248d195600041fad49af9ebea353f -
Trigger Event:
release
-
Statement type:
File details
Details for the file forge_infer-0.2.1-py3-none-any.whl.
File metadata
- Download URL: forge_infer-0.2.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
513243e5fdffc9360e1ae756748589fa168115291f05aa62f08d521cdd84e848
|
|
| MD5 |
d3a65a2f1ae353acb8c496484830a2f3
|
|
| BLAKE2b-256 |
dceb310403d6fc517c333b9b87d87e638aa569235f142f19ad66099509d9c304
|
Provenance
The following attestation bundles were made for forge_infer-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on ArkaD171717/FORGE-Infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forge_infer-0.2.1-py3-none-any.whl -
Subject digest:
513243e5fdffc9360e1ae756748589fa168115291f05aa62f08d521cdd84e848 - Sigstore transparency entry: 1417200526
- Sigstore integration time:
-
Permalink:
ArkaD171717/FORGE-Infer@354834a4749248d195600041fad49af9ebea353f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ArkaD171717
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@354834a4749248d195600041fad49af9ebea353f -
Trigger Event:
release
-
Statement type: