Skip to main content

Entropy-guided dynamic expert selection for MoE models. Reduce inference costs by 30-50%.

Project description

Adaptive-K SDK

Entropy-guided dynamic expert selection for Mixture-of-Experts models
Reduce inference costs by 30-50% with proven methodology.

PyPI License Python


🚀 Quick Start

pip install adaptive-k
from adaptive_k import AdaptiveKRouter

# Load pre-calibrated router
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")

# Route tokens
indices, weights, metrics = router.route(router_logits, return_metrics=True)

print(f"Compute savings: {metrics.compute_savings:.1%}")
# Output: Compute savings: 47.2%

📊 Proven Results

Model Savings Quality Retained
Mixtral 8x7B 52.5% 99.8%
Qwen-MoE 32.4% 99.9%
OLMoE-1B-7B 24.7% 99.7%

💡 How It Works

Adaptive-K dynamically selects the number of experts (K) based on routing entropy:

Low entropy (confident) → K=1 → 87.5% compute saved
Medium entropy         → K=2 → 75% compute saved  
High entropy (uncertain) → K=4 → Full routing

The key insight: when the router is confident, fewer experts are needed.


📖 Usage

Basic Routing

from adaptive_k import AdaptiveKRouter

router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")

# Your MoE router logits (batch_size, num_experts)
router_logits = model.router(hidden_states)

# Adaptive-K routing
expert_indices, expert_weights, _ = router.route(router_logits)

# Use selected experts
output = execute_experts(hidden_states, expert_indices, expert_weights)

Custom Calibration

from adaptive_k import Calibrator

calibrator = Calibrator(
    target_savings=0.40,      # 40% target savings
    quality_threshold=0.99    # Max 1% quality loss
)

result = calibrator.calibrate(
    model=your_model,
    dataset=calibration_data
)

print(f"Optimal thresholds: {result.optimal_thresholds}")
print(f"Expected savings: {result.expected_savings:.1%}")

Check Statistics

# After processing many tokens
print(router.stats)
# {
#   'tokens_processed': 1_234_567,
#   'average_savings': 0.472,
#   'estimated_cost_reduction': '47.2%'
# }

🔧 Configuration

from adaptive_k import AdaptiveKRouter, RoutingConfig

config = RoutingConfig(
    k_values=[1, 2, 4],           # Available K values
    entropy_thresholds=[0.6, 1.2], # H < 0.6 → K=1, H < 1.2 → K=2, else K=4
    num_experts=8
)

router = AdaptiveKRouter(config=config)

🔌 Integrations

HuggingFace Transformers

# Coming in v0.2.0
router = AdaptiveKRouter.from_pretrained("mixtral-8x7b")
model = router.patch(model)  # Automatic integration

vLLM

# Coming in v0.3.0
from adaptive_k.integrations import vllm_patch
model = vllm_patch(model, router)

TensorRT-LLM

See our TensorRT-LLM PR #10672 for native integration.


📈 Benchmarking

# CLI benchmark
adaptive-k benchmark --model mixtral-8x7b --dataset wikitext-2

# Output:
# Model: mixtral-8x7b
# Dataset: wikitext-2
# Baseline perplexity: 5.42
# Adaptive-K perplexity: 5.44 (+0.4%)
# Compute savings: 47.2%

📄 License

Apache 2.0 - Free for commercial use.


🔗 Links


📞 Support


Made with ❤️ by Vertex Data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptive_k-0.1.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptive_k-0.1.1-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file adaptive_k-0.1.1.tar.gz.

File metadata

  • Download URL: adaptive_k-0.1.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adaptive_k-0.1.1.tar.gz
Algorithm Hash digest
SHA256 176c5d51106a8ede1a9ef2a14901b81134e829ffe0029957bc99f0018a491551
MD5 0072c0e7365cee999445dea719d90102
BLAKE2b-256 b6a3d69073c4b0eaceb29def1c72d9507cfbe26ef4476673b657afa896e508e2

See more details on using hashes here.

File details

Details for the file adaptive_k-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: adaptive_k-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adaptive_k-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f2dff72e5ef1a78d5a4f735bca3f5741dce6ce57fd9b71f31603d6526b9046f
MD5 fd1081f331e6564ca30172c14de05d06
BLAKE2b-256 d6ecaffc43ec5daa3b2ed7e38fe167425176b981e6e98f4b1a0374f8562aa5ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page