Entropy-guided dynamic expert selection for MoE models. Reduce inference costs by 30-50%. Now with observability & monitoring.

These details have not been verified by PyPI

Project links

Project description

Adaptive-K SDK

Entropy-guided dynamic expert selection for Mixture-of-Experts models
Reduce inference costs by 30-50% with proven methodology.

🚀 Quick Start

pip install adaptive-k-routing

from adaptive_k import AdaptiveKRouter

# Load pre-calibrated router
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")

# Route tokens
indices, weights, metrics = router.route(router_logits, return_metrics=True)

print(f"Compute savings: {metrics.compute_savings:.1%}")
# Output: Compute savings: 47.2%

With Observability Support

pip install adaptive-k-routing[observability]

📊 Proven Results

Model	Savings	Quality Retained
Mixtral 8x7B	52.5%	99.8%
Qwen-MoE	32.4%	99.9%
OLMoE-1B-7B	24.7%	99.7%

💡 How It Works

Adaptive-K dynamically selects the number of experts (K) based on routing entropy:

Low entropy (confident) → K=1 → 87.5% compute saved
Medium entropy         → K=2 → 75% compute saved  
High entropy (uncertain) → K=4 → Full routing

The key insight: when the router is confident, fewer experts are needed.

📖 Usage

Basic Routing

from adaptive_k import AdaptiveKRouter

router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")

# Your MoE router logits (batch_size, num_experts)
router_logits = model.router(hidden_states)

# Adaptive-K routing
expert_indices, expert_weights, _ = router.route(router_logits)

# Use selected experts
output = execute_experts(hidden_states, expert_indices, expert_weights)

Custom Calibration

from adaptive_k import Calibrator

calibrator = Calibrator(
    target_savings=0.40,      # 40% target savings
    quality_threshold=0.99    # Max 1% quality loss
)

result = calibrator.calibrate(
    model=your_model,
    dataset=calibration_data
)

print(f"Optimal thresholds: {result.optimal_thresholds}")
print(f"Expected savings: {result.expected_savings:.1%}")

Check Statistics

# After processing many tokens
print(router.stats)
# {
#   'tokens_processed': 1_234_567,
#   'average_savings': 0.472,
#   'estimated_cost_reduction': '47.2%'
# }

🔧 Configuration

from adaptive_k import AdaptiveKRouter, RoutingConfig

config = RoutingConfig(
    k_values=[1, 2, 4],           # Available K values
    entropy_thresholds=[0.6, 1.2], # H < 0.6 → K=1, H < 1.2 → K=2, else K=4
    num_experts=8
)

router = AdaptiveKRouter(config=config)

🔌 Integrations

HuggingFace Transformers

# Coming in v0.2.0
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")
model = router.patch(model)  # Automatic integration

vLLM

# Coming in v0.3.0
from adaptive_k.integrations import vllm_patch
model = vllm_patch(model, router)

TensorRT-LLM

See our TensorRT-LLM PR #10672 for native integration.

📈 Benchmarking

# CLI benchmark
adaptive-k benchmark --model mixtral-8x7b --dataset wikitext-2

# Output:
# Model: mixtral-8x7b
# Dataset: wikitext-2
# Baseline perplexity: 5.42
# Adaptive-K perplexity: 5.44 (+0.4%)
# Compute savings: 47.2%

📄 License

Apache 2.0 - Free for commercial use.

🔗 Links

Website: https://adaptive-k.vertexdata.it
Paper: Entropy-Guided Dynamic Expert Selection
GitHub: https://github.com/Gabrobals/sbm-efficient

📞 Support

Email: amministrazione@vertexdata.it
Issues: GitHub Issues

Made with ❤️ by Vertex Data

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.5

Jan 17, 2026

0.1.4

Jan 16, 2026

0.1.3

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptive_k_routing-0.1.5.tar.gz (23.0 kB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

adaptive_k_routing-0.1.5-py3-none-any.whl (22.3 kB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file adaptive_k_routing-0.1.5.tar.gz.

File metadata

Download URL: adaptive_k_routing-0.1.5.tar.gz
Upload date: Jan 17, 2026
Size: 23.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adaptive_k_routing-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`46ea52add0a9037ddd15f0cc1c2bf906a767982deb1b75ebf5935d37a91b792a`
MD5	`42251a54c5a2a3a9cacdcf1dfac4a467`
BLAKE2b-256	`33a951631b7117d51219997a1740f34fd6bb3b74dd6118098437731d34516451`

See more details on using hashes here.

File details

Details for the file adaptive_k_routing-0.1.5-py3-none-any.whl.

File metadata

Download URL: adaptive_k_routing-0.1.5-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adaptive_k_routing-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b0aba72bcb03984df33e3ba70dfa49d3801f2336cd31e7425a988d792b669f5`
MD5	`c408d067e00399167f94e8947b76a20a`
BLAKE2b-256	`f39ba6464d2fce3ee7226d266c3ce32e821fbe8ef832b2da75b0b06c0d5b7b3c`

See more details on using hashes here.

adaptive-k-routing 0.1.5

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Adaptive-K SDK

🚀 Quick Start

With Observability Support

📊 Proven Results

💡 How It Works

📖 Usage

Basic Routing

Custom Calibration

Check Statistics

🔧 Configuration

🔌 Integrations

HuggingFace Transformers

vLLM

TensorRT-LLM

📈 Benchmarking

📄 License

🔗 Links

📞 Support

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes