Entropy-guided dynamic expert selection for MoE models. Reduce inference costs by 30-50%.
Project description
Adaptive-K SDK
Entropy-guided dynamic expert selection for Mixture-of-Experts models
Reduce inference costs by 30-50% with proven methodology.
🚀 Quick Start
pip install adaptive-k
from adaptive_k import AdaptiveKRouter
# Load pre-calibrated router
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")
# Route tokens
indices, weights, metrics = router.route(router_logits, return_metrics=True)
print(f"Compute savings: {metrics.compute_savings:.1%}")
# Output: Compute savings: 47.2%
📊 Proven Results
| Model | Savings | Quality Retained |
|---|---|---|
| Mixtral 8x7B | 52.5% | 99.8% |
| Qwen-MoE | 32.4% | 99.9% |
| OLMoE-1B-7B | 24.7% | 99.7% |
💡 How It Works
Adaptive-K dynamically selects the number of experts (K) based on routing entropy:
Low entropy (confident) → K=1 → 87.5% compute saved
Medium entropy → K=2 → 75% compute saved
High entropy (uncertain) → K=4 → Full routing
The key insight: when the router is confident, fewer experts are needed.
📖 Usage
Basic Routing
from adaptive_k import AdaptiveKRouter
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")
# Your MoE router logits (batch_size, num_experts)
router_logits = model.router(hidden_states)
# Adaptive-K routing
expert_indices, expert_weights, _ = router.route(router_logits)
# Use selected experts
output = execute_experts(hidden_states, expert_indices, expert_weights)
Custom Calibration
from adaptive_k import Calibrator
calibrator = Calibrator(
target_savings=0.40, # 40% target savings
quality_threshold=0.99 # Max 1% quality loss
)
result = calibrator.calibrate(
model=your_model,
dataset=calibration_data
)
print(f"Optimal thresholds: {result.optimal_thresholds}")
print(f"Expected savings: {result.expected_savings:.1%}")
Check Statistics
# After processing many tokens
print(router.stats)
# {
# 'tokens_processed': 1_234_567,
# 'average_savings': 0.472,
# 'estimated_cost_reduction': '47.2%'
# }
🔧 Configuration
from adaptive_k import AdaptiveKRouter, RoutingConfig
config = RoutingConfig(
k_values=[1, 2, 4], # Available K values
entropy_thresholds=[0.6, 1.2], # H < 0.6 → K=1, H < 1.2 → K=2, else K=4
num_experts=8
)
router = AdaptiveKRouter(config=config)
🔌 Integrations
HuggingFace Transformers
# Coming in v0.2.0
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")
model = router.patch(model) # Automatic integration
vLLM
# Coming in v0.3.0
from adaptive_k.integrations import vllm_patch
model = vllm_patch(model, router)
TensorRT-LLM
See our TensorRT-LLM PR #10672 for native integration.
📈 Benchmarking
# CLI benchmark
adaptive-k benchmark --model mixtral-8x7b --dataset wikitext-2
# Output:
# Model: mixtral-8x7b
# Dataset: wikitext-2
# Baseline perplexity: 5.42
# Adaptive-K perplexity: 5.44 (+0.4%)
# Compute savings: 47.2%
📄 License
Apache 2.0 - Free for commercial use.
🔗 Links
- Website: https://adaptive-k.vertexdata.it
- Paper: Entropy-Guided Dynamic Expert Selection
- GitHub: https://github.com/Gabrobals/sbm-efficient
📞 Support
- Email: amministrazione@vertexdata.it
- Issues: GitHub Issues
Made with ❤️ by Vertex Data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adaptive_k-0.1.2.tar.gz.
File metadata
- Download URL: adaptive_k-0.1.2.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6291f158ab64ad15103396a3342b3c5278917cb2dbd4df25f0c4530e440add09
|
|
| MD5 |
c33f30dec8d5d888d0769e3364e8c9d6
|
|
| BLAKE2b-256 |
e9639eb37d537753ccf5c654a48469eb881d83ebec254153c5bfc2b0e73dd16a
|
File details
Details for the file adaptive_k-0.1.2-py3-none-any.whl.
File metadata
- Download URL: adaptive_k-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
742d084101b40a34ae9c9f26a30c0a6a7235a432f9bd6d9762699d444a77ddae
|
|
| MD5 |
609476825dde8e40d1882feebf0aa18b
|
|
| BLAKE2b-256 |
62360379990086cab8745bff24e3f2e8c381740959ac51b6637dd8424145fe47
|