Skip to main content

A domain-specific language for Mixture-of-Experts scheduling policies

Project description

MoE-PolicyLang

A scheduling language for Mixture-of-Experts models.

Author: Jesse Pokora · License: MIT


TL;DR

MoE models pack a lot of weights into "experts" but only fire a few per token. The rest can live in CPU RAM and get pulled to the GPU on demand. Which experts to keep, when to prefetch them, and what to do on a miss is a policy question, and every existing MoE serving system hardcodes its answer inside the runtime.

MoE-PolicyLang is a small declarative language for that policy. Swapping LRU for LFU is a one-word change; nothing else has to move.

On an RTX 5080 (16 GB), it runs Qwen1.5-MoE (28.6 GB fp16) at 4.61 tok/s — 8.1x faster than HuggingFace's device_map="auto" — with bit-identical output.


The problem

A Mixture-of-Experts layer replaces a single feed-forward block with N "experts" plus a small router. For each token the router picks the top-k experts (typically k=2 to k=8) and only those fire. Mixtral uses 8 experts per layer (top-2), Qwen1.5-MoE uses 60 (top-4), DeepSeek-V2-Lite uses 64 (top-6). Most experts sit idle on any given token, but the weights still have to be reachable in case the router picks them.

For a 28 GB MoE model on a 16 GB GPU, that means offloading: keep some expert weights on GPU, the rest in CPU RAM, and move weights across the PCIe bus when the router asks for one that isn't resident. PCIe is much slower than reading from GPU memory, so if every router pick is a miss you're stuck — HuggingFace's device_map="auto" runs Qwen1.5-MoE on a 5080 at 0.57 tok/s.

There are four interlocking decisions to make:

Decision Question Example strategies
Cache Which experts stay on GPU? LRU (drop least-recently-used), LFU (drop least-frequently-used), score-based
Prefetch Which to load before they're requested? Affinity (layer L → L+1 patterns), history, lookahead
Schedule What to do on a cache miss? Wait for the GPU transfer, run on CPU, decide per-call
Adapt When to change strategy? Conditional rules on runtime metrics

Every existing MoE serving system (ExpertFlow, Fiddler, MoE-Infinity, HybriMoE, ProMoE, FineMoE) hardcodes these four decisions inside its runtime. Changing any one strategy means reading and modifying the system's expert-management module: roughly 200 to 2,000 LOC depending on the system.

MoE-PolicyLang lets you write the policy as a short .moe file and attach it to a HuggingFace or vLLM model. The runtime hooks that consume the policy stay the same; only the policy text changes.

Throughput and hit rate comparison across policies on consumer GPU


The Language

A MoE-PolicyLang policy is a .moe file with four composable blocks:

policy balanced {
    cache {
        capacity = 16
        eviction = lfu
        frequency_decay = 0.9
    }
    prefetch {
        strategy = history
        budget = 4
    }
    schedule { mode = hybrid }
    adapt {
        when hit_rate < 0.4 for 100 accesses
            { eviction = lru }
    }
}
Block Controls Strategies
cache Which experts stay on GPU LRU drops the least-recently-used; LFU drops the least-frequently-used with decay; score ranks by router gate value; freq-threshold keeps anything above a frequency cutoff
prefetch Experts loaded before they're requested History uses a running co-occurrence matrix; affinity uses layer L → L+1 patterns; lookahead peeks ahead in the router output
schedule What happens on a cache miss gpu-only waits for the transfer; cpu-fallback runs the missed expert on CPU; hybrid decides per-call based on estimated latency
adapt Self-tuning at runtime Conditional rules of the form when <metric> <op> <value> for <window> { <override> }

Switching from LRU to LFU is a one-word change. Adding prefetching is two lines.


Attaching to a model

import moe_policylang
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0924")

# Auto-generate a tuned policy from the model + GPU, attach it
mgr = moe_policylang.auto_attach(model)
output = model.generate(...)
print(mgr.get_stats())  # hit rate, transfers, evictions

Or write a policy explicitly:

mgr = moe_policylang.attach(model, """
    policy aggressive {
        cache { capacity = 8  eviction = lru }
    }
""")

Or load a .moe file:

mgr = moe_policylang.attach(model, open("my_policy.moe").read())

attach() parses the policy, runs the validator, compiles it to a PolicyHook, and registers forward hooks on every MoE layer of the model. From that point on, normal model.generate() calls trigger the hooks.


How a policy runs at runtime

On each MoE layer, after the router picks its top-k experts:

  1. Look up each picked expert in the GPU cache (hit or miss).
  2. For misses, decide whether to wait for a CPU→GPU transfer or run that expert on CPU. The schedule block picks the policy.
  3. Insert newly loaded experts into the cache. If full, the eviction rule drops something.
  4. Prefetch experts the next few layers are likely to want.
  5. Run memory-pressure and TTL eviction triggers.

The hook is plain Python. It adds 6 µs/layer (simple LRU) to 47 µs/layer (composed policy with triggers), against an MoE forward-pass baseline of about 1,500 µs on A100 — under 3.2% of layer time.

The five steps above are the always-on layer of dynamism: cache contents change on every dispatch. PLCB adds a second, opt-in layer — set rebalance_interval > 0 in a per_layer block and the per-layer budget itself drifts at runtime based on observed routing entropy. adapt blocks add a third, opt-in layer — a threshold like when hit_rate < 0.4 rewrites the policy in place (deep-copy IR, validate, recompile, hot-swap the hook) without pausing inference. You pick how much dynamism you want; uniform allocation with no adapt rules is the static default.


Why a Language, Not YAML?

The cache, prefetch, and schedule blocks are key-value config and you could handle them with a JSON schema. The reason the grammar pays off is PLCB and the adapt block.

PLCB describes a per-layer cache layout, not a single cache:

per_layer {
    allocation         = uniform
    total_budget       = 864
    rebalance_interval = 500
    min_capacity       = 4
    max_capacity       = 48
}

"27 separate caches at total budget 864 with optional entropy allocation" is awkward to write as flat key-value pairs. The DSL gives it a block.

adapt is the other one. Hot-swap rules monitor metrics and rewrite the policy at runtime:

adapt {
    when hit_rate < 0.4 for 100 accesses { eviction = lru }
}

That's a conditional, not a config value. The grammar constrains what you can write (no arbitrary code in a scheduling policy), and 20 semantic rules catch bad policies at parse time instead of mid-inference.

A Python eDSL (@sched.policy decorator) and an auto_attach API are also available for programmatic construction and zero-config deployment.


Results

Live inference on consumer GPU

When the model doesn't fit: Qwen1.5-MoE-A2.7B (~28.6 GB fp16) on RTX 5080 (16 GB VRAM). Without MoE-PolicyLang, the only option is device_map="auto" at 0.57 tok/s. With a 4-line DSL policy:

Config Strategy Cap VRAM tok/s 95% CI
Baseline (auto) 12.0 GB 0.57±0.00
Skeleton LRU (cap=1) 1 4.7 GB 4.23±0.22 [4.04, 4.42]
Aggressive LRU 2 5.2 GB 4.17±0.03 [4.14, 4.20]
Balanced LFU+hist. 4 7.3 GB 4.35±0.06 [4.30, 4.40]
Generous LFU+hist. 8 10.1 GB 4.61±0.08 [4.54, 4.68]

Cost-performance: 4.61 tok/s on a $1k consumer GPU is comparable in absolute throughput to Fiddler's 4.17 tok/s on a $15k A100, though the models are different and the numbers are not directly comparable.

Decomposition: roughly 92% of the 8.1x speedup comes from expert-aware loading (skeleton on GPU, experts on CPU). Even a capacity-1 "every dispatch is a miss" config reaches 4.23 tok/s (7.4x). Caching adds the remaining +0.38 tok/s. The DSL's contribution is not the loading mechanism (any system could implement that) but the remaining 8%: the policy layer that decides what to cache, evict, and prefetch, accessible without runtime modification and adaptable at runtime via adapt rules that no static config expresses. On models with more experts, the policy layer's share grows. On DeepSeek (A100), matched-budget per-layer allocation gains +14.7pp hit rate over flat caching at the same total slot count, a pure policy-structure effect with no capacity confound.

n=5, bootstrap 95% CIs. For output correctness: greedy decoding (do_sample=False) produces bit-identical token sequences across all policy configs vs device_map="auto" baseline (4 prompts x 3 policies = 12 comparisons); perplexity on wikitext-2 matches within 0.024%.

When the model fits (overhead measurement): OLMoE-1B-7B (~14 GB) fits entirely on 16 GB VRAM. There vanilla (no hooks) is fastest at 39.2 tok/s, with the policy hooks adding 12-14% overhead. This is not the target scenario; it measures overhead when there is nothing to offload. MoE-PolicyLang is for models that don't fit.

Hit rate with bootstrap confidence intervals

PLCB: Per-Layer Cache Budgeting (with a negative result)

A flat cache holds, say, 32 expert weights total, shared across all 27 MoE layers of DeepSeek-V2-Lite. If layer 0 is hot, LFU keeps its experts, and layers that haven't appeared recently get evicted. In steady state, a flat cap=32 cache on DeepSeek covers only ~11 of the 27 layers — the other 16 have zero experts cached and miss every dispatch.

A per-layer cache splits the same total budget (864 = 27×32 slots) into one cache per layer. Each layer keeps its own hot experts.

There's a positive result and a caveat.

The caveat first: per-layer caching only helps when each layer's budget covers that layer's working set. On 16 GB consumer hardware the per-layer budgets are too small for that, and the aggregated cache pushes the CUDA allocator to the VRAM ceiling — throughput drops by about 16%. Flat shared caching is the default for memory-constrained deployments. Per-layer wins with VRAM headroom and high expert counts (DeepSeek-V2-Lite on A100, below).

When per-layer caching wins vs hurts: DeepSeek/A100 lies in the wins region; Qwen/RTX 5080 in the hurts region

  1. When the regime permits, per-layer cache structure is what matters. At matched total budget on DeepSeek-V2-Lite (A100-80GB), replacing a shared cache with per-layer caches gives +14.7pp hit rate in offline trace replay and eliminates all CPU/GPU transfers in steady state. Output is bit-identical to the fully-resident baseline.

The headline throughput gain (1.60 to 10.22 tok/s, +540%) compares shared-32 to per-layer-864, which is 27x more total slots. The matched-budget +14.7pp hit rate and transfer elimination are the core findings; the 540% wall-clock number folds in the capacity expansion.

Flat shared cache leaves layers uncovered; per-layer caches at matched total budget cover every layer

This structural difference maps directly onto MoE-aware baselines. Fiddler's expert placement is a hardcoded global popularity ranking, which is structurally equivalent to the flat cache on the left. On an A100-80GB where 85% of Mixtral's experts fit on-device (217/256), the ranking barely matters because almost everything is resident. On a constrained GPU where only a fraction of experts fit, a global ranking starves cold layers (left heatmap), while a per-layer policy maintains coverage at every layer (right heatmap). Per-layer caching enables topologies that a flat global ranking cannot express. The mechanical payoff is PCIe stall elimination: expert offloading is memory-bandwidth-bound, so every cache miss costs a CPU-to-GPU transfer. When per-layer caches cover each layer's working set, steady-state misses drop to zero, which is why a hit-rate improvement turns into a 6.4x wall-clock gain (10.22 vs 1.60 tok/s on DeepSeek-V2-Lite at matched total budget).

  1. The allocation signal does not matter. We tested six signals (Shannon entropy, inverse top-k mass, inverse variance, inverse KL, inverse Gini, uniform). None differentiates from uniform by more than 2.5pp in hit rate, and all six collapse to within noise of uniform in wall-clock on two models. Uniform is the default. Shannon entropy is opt-in for models with high inter-layer entropy spread (ΔH around 1 nat or more), but it was within noise of uniform on every model tested end-to-end.

Per-layer entropy and capacity allocation

Strategy Total slots Hit Rate Δ vs shared Wall-clock (A100)
Shared cache 32 48.6% baseline 1.60 tok/s
Per-layer uniform 864 (27x) 63.3% +14.7pp 10.22 tok/s
Per-layer entropy 864 (27x) 65.5% +16.9pp 10.17 tok/s (~ uniform)

Mechanism vs policy: Fiddler head-to-head (A100-80GB, Mixtral-8x7B)

Fiddler and MoE-PolicyLang on the same hardware, model, prompt, and methodology (n=5, greedy decoding, 64 tokens):

Config tok/s (±σ) 95% CI GPU Peak Hit Rate Transfers
Fiddler 4.17 ± 0.02 [4.16, 4.18] 80.6 GB 88.3%
MPL fiddler_equiv (cap=2) 0.18 ± 0.00 [0.18, 0.18] 6.4 GB 19.5% 4,283
MPL balanced (cap=4) 0.29 ± 0.00 [0.29, 0.29] 39.5 GB 46.4% 2,665
MPL generous (cap=6) 0.45 ± 0.00 [0.45, 0.45] 61.7 GB 71.0% 1,726

All MPL configs produce bit-identical output. Fiddler is 9-23x faster.

The gap is mechanism, not policy. Fiddler uses an optimized C++/CUDA transfer pipeline with pre-allocated GPU memory slots and direct DMA. MoE-PolicyLang dispatches through Python-level Tensor.to() calls in the HuggingFace forward pass. At Fiddler's 85% GPU residency (217/256 experts on-device), placement strategy isn't doing the work; the model mostly fits.

MoE-PolicyLang is a policy specification layer, not a serving system. It specifies which experts to cache, evict, and prefetch, but does not implement the physical mechanism that moves expert tensors between devices.

vLLM backend: same policy, different mechanism

A second backend integrates with vLLM for production-grade quantized MoE inference. VLLMPolicyRunner instruments vLLM's router layers to capture expert routing decisions and feeds them through the same DSL, compiler, and hooks that the HuggingFace backend uses.

from moe_policylang.integrations.vllm_backend import VLLMPolicyRunner

runner = VLLMPolicyRunner(
    model="Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4",
    policy_dsl='''
        policy demo {
            cache { capacity = 8  eviction = lru }
            prefetch { strategy = lookahead  lookahead = 1 }
            schedule { mode = gpu_only }
        }
    ''',
    quantization="gptq",
)

results = runner.generate(["What is expert routing?"], max_tokens=30)
print(results["text"])          # generated text
print(results["policy_stats"])  # cache hits, prefetch accuracy, etc.

Verified on RTX 5080 (16 GB), vLLM 0.21, GPTQ-Int4 quantization. Captures 744 routing events across 24 layers × 60 experts, with a 14.7% cache hit rate and 72% prefetch accuracy from a minimal 8-slot LRU policy.

The same .moe policy text runs on both backends; only the mechanism layer (HF eager Python vs vLLM optimized kernels) differs. Combined with the Fiddler head-to-head above, this is the empirical case for the policy/mechanism separation — the abstraction holds across two production mechanism layers, not just structurally.

Cache hit rates: capacity sweeps

Cache hit rate vs capacity for Mixtral and DeepSeek

Capacity sweeps on offline traces:

  • Mixtral-8x7B (8 experts, top-2) saturates at cap=8 with around 100% hit rate, since all experts fit. Policy choice barely matters here.
  • DeepSeek-V2-Lite (64 experts, top-6) reaches only 51% hit rate at cap=32 (half the experts). LFU consistently beats LRU across budgets because DeepSeek has significant frequency skew (some experts activated 3-5x more often). This is the regime where policy selection and per-layer budgeting make a measurable difference.

Policy authoring effort

To add a new policy variant to one of these systems, a developer needs to read and modify the system's expert-management module. MoE-PolicyLang replaces that with a short .moe file. The 14–40x numbers below count lines a user writes to express a policy; they do not include MoE-PolicyLang's own runtime, which is around 4,300 LOC.

System Expert-mgmt module DSL equivalent Authoring reduction
Fiddler 280 LOC 7 lines 40x
HybriMoE ~500 LOC 14 lines 36x
MoE-Infinity 520 LOC 16 lines 33x
vLLM 300 LOC 12 lines 25x
ExpertFlow ~400 LOC 16 lines 25x
FineMoE ~350 LOC 25 lines 14x

Methodology: non-blank, non-comment lines in the primary expert-management module. Measured sources: Fiddler from set_expert_loc() + execute_fiddler() in src/fiddler/mixtral.py (280 LOC); MoE-Infinity from expert_prefetcher.py + expert_cache.py (520 LOC); vLLM from MixtralMoE expert dispatch in vllm/model_executor/ (300 LOC). Counts marked ~ are estimated from paper descriptions of closed-source systems. Switching between strategies (LRU to LFU) is a one-word change in the DSL versus rewriting cache data structures in the hand-coded version.

Dispatch overhead

Dispatch overhead with 95% confidence intervals

Per-layer dispatch (the Python hook that decides cache/evict/prefetch) adds under 3.2% of MoE forward-pass time on A100: 6–47 µs/layer against a 1,459 µs baseline. This is the policy decision overhead; cache misses and weight transfers are accounted for separately and depend on the policy and workload.

Hit rate with bootstrap confidence intervals


Installation

From PyPI:

pip install moe-policylang           # DSL only (no GPU deps)
pip install moe-policylang[gpu]      # + torch, transformers, accelerate
pip install moe-policylang[vllm]     # + vLLM (GPTQ/AWQ quantized inference)
pip install moe-policylang[all]      # everything

For quantized models, use the [vllm] extra. vLLM handles GPTQ and AWQ quantization with optimized kernels; MoE-PolicyLang observes routing decisions and applies policy logic without managing the tensors directly.

For Blackwell GPUs (RTX 5080/5090), set these env vars before running vLLM:

export VLLM_USE_FLASHINFER_SAMPLER=0
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
export VLLM_FLASH_ATTN_VERSION=2

From source (development):

git clone https://github.com/jesse-pokora/MoE-PolicyLang.git
cd MoE-PolicyLang
pip install -e ".[dev,gpu]"

Cython fast path (for complex policies):

pip install moe-policylang[cython]
python setup_cython.py build_ext --inplace

Python dispatch ranges from 6 µs/layer (simple LRU) to 47 µs/layer (composed policies with triggers). The Cython path targets the high end: freq_threshold and composed_full drop from 28-47 µs to under 10 µs/layer. Simple policies like lru_basic (6 µs) see no benefit.


Tested Models

MoE-PolicyLang auto-detects MoE structure from any HuggingFace model with no model-specific code required. Evaluated on:

Model Experts x Layers Routing Hardware Backend
Mixtral-8x7B-Instruct 8 x 32 top-2 A100-80 GB HF Transformers
DeepSeek-V2-Lite 64 x 27 top-6 A100-80 GB HF Transformers
Qwen1.5-MoE-A2.7B 60 x 24 top-4 RTX 5080 (16 GB) HF Transformers
Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 60 x 24 top-4 RTX 5080 (16 GB) vLLM
OLMoE-1B-7B 64 x 16 top-8 RTX 5080 (16 GB) HF Transformers

Project Structure

moe_policylang/
├── grammar.lark           # Lark LALR grammar (62 productions)
├── parser.py              # Grammar → PolicyIR
├── ir.py                  # Intermediate representation
├── validator.py           # 20 semantic validation rules
├── compiler.py            # IR → CompiledPolicy
├── auto.py                # Auto-generate policies from model + GPU
├── dsl.py                 # Python eDSL (@sched.policy decorator)
├── adaptive.py            # Adaptive policies (adapt blocks)
├── autotuner.py           # Grid-search policy optimizer
├── cli.py                 # CLI: validate, compile, run
├── runtime/
│   ├── hooks.py           # 5-step per-layer dispatch protocol
│   ├── cache.py           # LRU / LFU / Score / FreqThreshold
│   ├── prefetch.py        # Affinity / History / Lookahead
│   ├── scheduler.py       # GPU-only / CPU-fallback / Hybrid
│   ├── per_layer.py       # PLCB — per-layer cache budgeting
│   ├── triggers.py        # Memory-pressure & TTL eviction
│   └── _fast/             # Cython-accelerated paths
└── integrations/
    ├── __init__.py         # attach() — main user API
    ├── huggingface.py      # HuggingFace Transformers hooks
    ├── vllm_backend.py     # vLLM integration (routing trace + policy replay)
    ├── weight_placement.py # Expert offloading manager
    └── async_transfer.py   # CUDA stream async transfers

Running Experiments

# Offline trace replay (no GPU needed)
python scripts/run_eval.py
python scripts/run_sweep.py

# Live inference on consumer GPU
python scripts/run_dsl_demo.py
python scripts/run_constrained_e2e.py

# Generate all paper figures
python scripts/generate_figures.py

# Benchmarks & evaluations (requires CUDA GPU + model weights)
python scripts/bench_qwen_multirun.py   # Qwen throughput (Table 4)
python scripts/bench_coldstart.py       # Cold-start throughput analysis
python scripts/bench_power.py           # Power/energy measurement
python scripts/eval_quality.py          # Perplexity evaluation (wikitext-2)
python scripts/ablation_plcb_sensitivity.py  # PLCB hyperparameter sweep
python scripts/plot_coldstart.py        # Generate cold-start figure

Tests

python -m pytest tests/ -q

453+ tests covering parsing, validation, compilation, runtime dispatch, adaptive policies, per-layer PLCB, and integration hooks.


Documentation

See docs/MANUAL.md for the full language reference, runtime API, and policy authoring guide.


Glossary

  • MoE (Mixture-of-Experts). A Transformer layer that replaces a single feed-forward block with N expert networks plus a small router that picks the top-k experts per token.
  • Expert. One feed-forward sub-network inside an MoE layer. Mixtral has 8 per layer, Qwen1.5-MoE has 60, DeepSeek-V2-Lite has 64.
  • Router / top-k routing. The small classifier inside each MoE layer that scores experts and picks the k highest per token.
  • Offloading. Keeping some weights in CPU RAM and moving them to GPU on demand. The cost is the PCIe transfer.
  • PCIe. The bus between CPU memory and the GPU. Roughly two orders of magnitude slower than reading from GPU HBM/GDDR, so cache misses are expensive.
  • Skeleton. Everything in the model that isn't an expert: embeddings, attention, layer norms, LM head. MoE-PolicyLang pins the skeleton on GPU (≈3.7 GB for Qwen1.5-MoE) and only the expert weights move.
  • Cache hit / miss. A hit means the expert the router picked is already on GPU. A miss means we have to fetch it (or run it on CPU, if schedule = cpu_fallback).
  • LRU / LFU. Least-Recently-Used and Least-Frequently-Used cache eviction. LRU drops whatever hasn't been touched lately; LFU drops whatever has the lowest activation count (with a decay factor so old hot experts age out).
  • fp16 / GPTQ / AWQ. Weight precisions. fp16 is the standard half-precision format used in this paper's experiments. GPTQ and AWQ are 4-bit quantization formats that vLLM consumes; they trade a small amount of perplexity for a large VRAM reduction.
  • KV-cache. The cache of attention keys/values from previous tokens during generation. It grows with sequence length and competes with expert weights for VRAM.
  • pp (percentage points). Used for hit-rate deltas: +14.7 pp means 48.6% → 63.3%, not a 14.7% relative change.
  • PLCB. Per-Layer Cache Budgeting. See the section above; the load-bearing part is the per-layer cache structure, not the entropy allocator that gave the technique its name.

Citation

@misc{pokora2026moepolicylang,
  title={MoE-PolicyLang: A Domain-Specific Language for Mixture-of-Experts Scheduling Policies},
  author={Pokora, Jesse},
  year={2026},
  url={https://github.com/jesse-pokora/MoE-PolicyLang}
}

License

MIT License — Copyright (c) 2026 Jesse Pokora

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moe_policylang-1.4.2.tar.gz (99.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moe_policylang-1.4.2-py3-none-any.whl (107.4 kB view details)

Uploaded Python 3

File details

Details for the file moe_policylang-1.4.2.tar.gz.

File metadata

  • Download URL: moe_policylang-1.4.2.tar.gz
  • Upload date:
  • Size: 99.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for moe_policylang-1.4.2.tar.gz
Algorithm Hash digest
SHA256 29aa69b13d0de2af16cc27b28cf5148249d124f0c5a6ccfa690da18bd9cc9c22
MD5 825b1a815d37216ebe54a09da4b14bfb
BLAKE2b-256 77be1ad6f2d39f8da06435f5f6d27e0addbb49c5cd343ea2c6fefa43a46e17b6

See more details on using hashes here.

File details

Details for the file moe_policylang-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: moe_policylang-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 107.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for moe_policylang-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2df1fdef355f67ca6dea8422d175f58641bc7ffef090008e80f57cbd3e8f52c9
MD5 7fc2b4604d22df4a59cac0b89b11cc0d
BLAKE2b-256 0e3a63da68eaab23a495f27abb2ce0b1e5a0fe5fb45ae49ca9c5c5e2579ff798

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page