Skip to main content

A domain-specific language for Mixture-of-Experts scheduling policies

Project description

MoE-PolicyLang

A scheduling language for Mixture-of-Experts models.

Author: Jesse Pokora · License: MIT


TL;DR

  • Mixture-of-Experts (MoE) models pack a lot of weights into "experts", but only a few experts fire per token. The rest can sit in CPU RAM and be pulled across PCIe to the GPU on demand.
  • Which experts to keep on GPU, when to prefetch the next ones, and what to do on a miss is a policy question. Every existing MoE serving system hardcodes its policy inside the runtime.
  • MoE-PolicyLang is a small declarative language for that policy. Swapping LRU for LFU is a one-word change; the cache, prefetch, scheduler, and eviction-trigger components stay the same.
  • Headline number: Qwen1.5-MoE (28.6 GB fp16) runs on a 16 GB RTX 5080 at 4.61 tok/s, 8.1x faster than HuggingFace's default device_map="auto", with bit-identical output.

The problem

A Mixture-of-Experts layer replaces a single feed-forward block with N "experts" plus a small router. For each token the router picks the top-k experts (typically k=2 to k=8) and only those fire. Mixtral uses 8 experts per layer (top-2), Qwen1.5-MoE uses 60 (top-4), DeepSeek-V2-Lite uses 64 (top-6). Most experts sit idle on any given token, but the weights still have to be reachable in case the router picks them.

For a 28 GB MoE model on a 16 GB GPU, that means offloading: keep some expert weights on GPU, the rest in CPU RAM, and move weights across the PCIe bus when the router asks for one that isn't resident. PCIe transfer is roughly two orders of magnitude slower than reading from GPU memory, so the worst case (every expert a miss) is painful: HuggingFace's device_map="auto" runs Qwen1.5-MoE on a 5080 at 0.57 tok/s.

Doing better requires four coordinated decisions:

Decision Question Example strategies
Cache Which experts stay on GPU? LRU (drop least-recently-used), LFU (drop least-frequently-used), score-based
Prefetch Which to load before they're requested? Affinity (layer L → L+1 patterns), history, lookahead
Schedule What to do on a cache miss? Wait for the GPU transfer, run on CPU, decide per-call
Adapt When to change strategy? Conditional rules on runtime metrics

Every existing MoE serving system (ExpertFlow, Fiddler, MoE-Infinity, HybriMoE, ProMoE, FineMoE) hardcodes these four decisions inside its runtime. Changing any one strategy means reading and modifying the system's expert-management module: roughly 200 to 2,000 LOC depending on the system.

MoE-PolicyLang lets you write the policy as a short .moe file and attach it to a HuggingFace or vLLM model. The runtime hooks that consume the policy stay the same; only the policy text changes.

Throughput and hit rate comparison across policies on consumer GPU


The Language

A MoE-PolicyLang policy is a .moe file with four composable blocks:

policy balanced {
    cache {
        capacity = 16
        eviction = lfu
        frequency_decay = 0.9
    }
    prefetch {
        strategy = history
        budget = 4
    }
    schedule { mode = hybrid }
    adapt {
        when hit_rate < 0.4 for 100 accesses
            { eviction = lru }
    }
}
Block Controls Strategies
cache Which experts stay on GPU LRU drops the least-recently-used; LFU drops the least-frequently-used with decay; score ranks by router gate value; freq-threshold keeps anything above a frequency cutoff
prefetch Experts loaded before they're requested History uses a running co-occurrence matrix; affinity uses layer L → L+1 patterns; lookahead peeks ahead in the router output
schedule What happens on a cache miss gpu-only waits for the transfer; cpu-fallback runs the missed expert on CPU; hybrid decides per-call based on estimated latency
adapt Self-tuning at runtime Conditional rules of the form when <metric> <op> <value> for <window> { <override> }

Switching from LRU to LFU is a one-word change. Adding prefetching is two lines.


Attaching to a model

import moe_policylang
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0924")

# Auto-generate a tuned policy from the model + GPU, attach it
mgr = moe_policylang.auto_attach(model)
output = model.generate(...)
print(mgr.get_stats())  # hit rate, transfers, evictions

Or write a policy explicitly:

mgr = moe_policylang.attach(model, """
    policy aggressive {
        cache { capacity = 8  eviction = lru }
    }
""")

Or load a .moe file:

mgr = moe_policylang.attach(model, open("my_policy.moe").read())

attach() parses the policy, runs the validator, compiles it to a PolicyHook, and registers forward hooks on every MoE layer of the model. From that point on, normal model.generate() calls trigger the hooks.


How a policy runs at runtime

On each MoE layer, after the router picks its top-k experts, the hook runs five steps:

  1. Cache lookup. For each selected expert, is its weight matrix already on GPU? Each lookup records a hit or a miss.
  2. Schedule. For each miss, the scheduler decides whether to wait for a CPU→GPU transfer or to run that expert's compute on CPU (the schedule block picks the policy).
  3. Cache update. Newly loaded experts go into the cache. If the cache is at capacity, the eviction rule (LRU, LFU, score, …) picks something to drop.
  4. Prefetch. The prefetcher predicts which experts the next few layers will want and starts moving them across PCIe.
  5. Trigger check. Memory-pressure and TTL eviction triggers run, evicting beyond the cache replacement rule if needed.

The hook is plain Python and adds 6 µs/layer (simple LRU) to 47 µs/layer (composed policy with triggers), against an MoE forward-pass baseline of ~1,500 µs on A100 — under 3.2% of layer time.


Why a Language, Not YAML?

The cache, prefetch, and schedule blocks are key-value config and could be handled by a JSON schema with Pydantic. Two things push it past that:

1. PLCB expresses a cache topology, not a parameter set. A flat schema like Cache(eviction="lfu", capacity=32) picks algorithms and scalars. PLCB picks a structure: N independent caches per layer with their own allocator, total budget, rebalance interval, and min/max bounds.

per_layer {
    allocation         = uniform
    total_budget       = 864
    rebalance_interval = 500
    min_capacity       = 4
    max_capacity       = 48
}

You can't reasonably encode "maintain 27 independent caches at total budget 864 with optional entropy-proportional allocation" in a flat schema — the specification is structural rather than scalar. PLCB is the existence proof that the grammar earns its keep.

2. adapt blocks express conditional runtime behavior. Hot-swap rules that monitor metrics and rewrite the policy aren't config — they're a small embedded rule language with metric, threshold, window, and rewrite target:

adapt {
    when hit_rate < 0.4 for 100 accesses { eviction = lru }
}

The grammar restricts what you can write (no arbitrary code in a scheduling policy), and 20 semantic rules catch bad policies at parse time rather than mid-inference.

There are also two other surfaces: a Python eDSL (@sched.policy decorator) for programmatic construction, and auto_attach for zero-config deployment. The .moe grammar is what makes the PLCB and adapt semantics work; the others are convenience wrappers.


Results

Dispatch overhead

Dispatch overhead with 95% confidence intervals

Per-layer dispatch (the Python hook that decides cache/evict/prefetch) adds under 3.2% of MoE forward-pass time on A100: 6–47 µs/layer against a 1,459 µs baseline. This is the policy decision overhead; cache misses and weight transfers are accounted for separately and depend on the policy and workload.

Policy authoring effort

To add a new policy variant to one of these systems, a developer needs to read and modify the system's expert-management module. MoE-PolicyLang replaces that with a short .moe file. The 14–40x numbers below count lines a user writes to express a policy; they do not include MoE-PolicyLang's own runtime, which is around 4,300 LOC.

System Expert-mgmt module DSL equivalent Authoring reduction
Fiddler 280 LOC 7 lines 40x
HybriMoE ~500 LOC 14 lines 36x
MoE-Infinity 520 LOC 16 lines 33x
vLLM 300 LOC 12 lines 25x
ExpertFlow ~400 LOC 16 lines 25x
FineMoE ~350 LOC 25 lines 14x

Methodology: non-blank, non-comment lines in the primary expert-management module. Measured sources: Fiddler from set_expert_loc() + execute_fiddler() in src/fiddler/mixtral.py (280 LOC); MoE-Infinity from expert_prefetcher.py + expert_cache.py (520 LOC); vLLM from MixtralMoE expert dispatch in vllm/model_executor/ (300 LOC). Counts marked ~ are estimated from paper descriptions of closed-source systems. Switching between strategies (LRU to LFU) is a one-word change in the DSL versus rewriting cache data structures in the hand-coded version.

Policy selection matters when the cache can't hold the working set

Cache hit rate vs capacity for Mixtral and DeepSeek

Capacity sweeps on offline traces:

  • Mixtral-8x7B (8 experts, top-2) saturates at cap=8 with around 100% hit rate, since all experts fit. Policy choice barely matters here.
  • DeepSeek-V2-Lite (64 experts, top-6) reaches only 51% hit rate at cap=32 (half the experts). LFU consistently beats LRU across budgets because DeepSeek has significant frequency skew (some experts activated 3-5x more often). This is the regime where policy selection and per-layer budgeting make a measurable difference.

EPCB: Per-layer cache budgeting (with a negative result)

A flat cache holds, say, 32 expert weights total, shared across all 27 MoE layers of DeepSeek-V2-Lite. If layer 0 is hot, LFU keeps its experts. Layers that came in early and haven't appeared recently get evicted. In steady state, a flat cap=32 cache on DeepSeek covers only ~11 of the 27 layers — the other 16 have zero experts cached and miss every dispatch.

A per-layer cache splits the same total budget (864 = 27×32 slots) into one cache per layer. Each layer keeps its own hot experts.

Empirical Per-layer Cache Budgeting (EPCB) has two findings, one positive and one negative.

  1. The regime caveat (read this first). Per-layer caching only helps when the per-layer budget covers each layer's active working set. On 16 GB consumer hardware, which is the case most readers will hit, per-layer caching hurts throughput by 16%: the per-layer budgets are too small to cover each layer's working set, and the aggregated cache pushes the CUDA allocator to the VRAM ceiling. Flat shared caching is the default recommendation for memory-constrained deployments. Per-layer wins when there is VRAM headroom and high expert counts (DeepSeek-V2-Lite on A100, below).

When per-layer caching wins vs hurts: DeepSeek/A100 lies in the wins region; Qwen/RTX 5080 in the hurts region

  1. When the regime permits, per-layer cache structure is what matters. At matched total budget on DeepSeek-V2-Lite (A100-80GB), replacing a shared cache with per-layer caches gives +14.7pp hit rate in offline trace replay and eliminates all CPU/GPU transfers in steady state. Output is bit-identical to the fully-resident baseline.

The headline throughput gain (1.60 to 10.22 tok/s, +540%) compares shared-32 to per-layer-864, which is 27x more total slots. The matched-budget +14.7pp hit rate and transfer elimination are the core findings; the 540% wall-clock number folds in the capacity expansion.

Flat shared cache leaves layers uncovered; per-layer caches at matched total budget cover every layer

This structural difference maps directly onto MoE-aware baselines. Fiddler's expert placement is a hardcoded global popularity ranking, which is structurally equivalent to the flat cache on the left. On an A100-80GB where 85% of Mixtral's experts fit on-device (217/256), the ranking barely matters because almost everything is resident. On a constrained GPU where only a fraction of experts fit, a global ranking starves cold layers (left heatmap), while a per-layer policy maintains coverage at every layer (right heatmap). Per-layer caching enables topologies that a flat global ranking cannot express. The mechanical payoff is PCIe stall elimination: expert offloading is memory-bandwidth-bound, so every cache miss costs a CPU-to-GPU transfer. When per-layer caches cover each layer's working set, steady-state misses drop to zero, which is why a hit-rate improvement turns into a 6.4x wall-clock gain (10.22 vs 1.60 tok/s on DeepSeek-V2-Lite at matched total budget).

Fiddler head-to-head (A100-80GB, Mixtral-8x7B)

Fiddler and MoE-PolicyLang on the same hardware, model, prompt, and methodology (n=5, greedy decoding, 64 tokens):

Config tok/s (±σ) 95% CI GPU Peak Hit Rate Transfers
Fiddler 4.17 ± 0.02 [4.16, 4.18] 80.6 GB 88.3%
MPL fiddler_equiv (cap=2) 0.18 ± 0.00 [0.18, 0.18] 6.4 GB 19.5% 4,283
MPL balanced (cap=4) 0.29 ± 0.00 [0.29, 0.29] 39.5 GB 46.4% 2,665
MPL generous (cap=6) 0.45 ± 0.00 [0.45, 0.45] 61.7 GB 71.0% 1,726

All MPL configs produce bit-identical output. Fiddler is 9-23x faster.

The gap is mechanism, not policy. Fiddler uses an optimized C++/CUDA transfer pipeline with pre-allocated GPU memory slots and direct DMA. MoE-PolicyLang dispatches through Python-level Tensor.to() calls in the HuggingFace forward pass. At Fiddler's 85% GPU residency (217/256 experts on-device), placement strategy isn't doing the work; the model mostly fits.

MoE-PolicyLang is a policy specification layer, not a serving system. It specifies which experts to cache, evict, and prefetch, but does not implement the physical mechanism that moves expert tensors between devices. The vLLM integration below shows the policy layer composing cleanly with a production inference engine: the same DSL captures routing decisions from vLLM's optimized MoE kernel path without modification.

  1. The allocation signal does not matter. We tested six signals (Shannon entropy, inverse top-k mass, inverse variance, inverse KL, inverse Gini, uniform). None differentiates from uniform by more than 2.5pp in hit rate, and all six collapse to within noise of uniform in wall-clock on two models. Uniform is the default. Shannon entropy is opt-in for models with high inter-layer entropy spread (ΔH around 1 nat or more), but it was within noise of uniform on every model tested end-to-end.

Per-layer entropy and capacity allocation

Strategy Total slots Hit Rate Δ vs shared Wall-clock (A100)
Shared cache 32 48.6% baseline 1.60 tok/s
Per-layer uniform 864 (27x) 63.3% +14.7pp 10.22 tok/s
Per-layer entropy 864 (27x) 65.5% +16.9pp 10.17 tok/s (~ uniform)

Live inference on consumer GPU

When the model doesn't fit: Qwen1.5-MoE-A2.7B (~28.6 GB fp16) on RTX 5080 (16 GB VRAM). Without MoE-PolicyLang, the only option is device_map="auto" at 0.57 tok/s. With a 4-line DSL policy:

Config Strategy Cap VRAM tok/s 95% CI
Baseline (auto) 12.0 GB 0.57±0.00
Skeleton LRU (cap=1) 1 4.7 GB 4.23±0.22 [4.04, 4.42]
Aggressive LRU 2 5.2 GB 4.17±0.03 [4.14, 4.20]
Balanced LFU+hist. 4 7.3 GB 4.35±0.06 [4.30, 4.40]
Generous LFU+hist. 8 10.1 GB 4.61±0.08 [4.54, 4.68]

Cost-performance: 4.61 tok/s on a $1k consumer GPU is comparable in absolute throughput to Fiddler's 4.17 tok/s on a $15k A100, though the models are different and the numbers are not directly comparable.

Decomposition: roughly 92% of the 8.1x speedup comes from expert-aware loading (skeleton on GPU, experts on CPU). Even a capacity-1 "every dispatch is a miss" config reaches 4.23 tok/s (7.4x). Caching adds the remaining +0.38 tok/s. The DSL's contribution is not the loading mechanism (any system could implement that) but the remaining 8%: the policy layer that decides what to cache, evict, and prefetch, accessible without runtime modification and adaptable at runtime via adapt rules that no static config expresses. On models with more experts, the policy layer's share grows. On DeepSeek (A100), matched-budget per-layer allocation gains +14.7pp hit rate over flat caching at the same total slot count, a pure policy-structure effect with no capacity confound.

n=5, bootstrap 95% CIs. For output correctness: greedy decoding (do_sample=False) produces bit-identical token sequences across all policy configs vs device_map="auto" baseline (4 prompts x 3 policies = 12 comparisons); perplexity on wikitext-2 matches within 0.024%.

When the model fits (overhead measurement): OLMoE-1B-7B (~14 GB) fits entirely on 16 GB VRAM. There vanilla (no hooks) is fastest at 39.2 tok/s, with the policy hooks adding 12-14% overhead. This is not the target scenario; it measures overhead when there is nothing to offload. MoE-PolicyLang is for models that don't fit.

Hit rate with bootstrap confidence intervals


Installation

From PyPI:

pip install moe-policylang           # DSL only (no GPU deps)
pip install moe-policylang[gpu]      # + torch, transformers, accelerate
pip install moe-policylang[vllm]     # + vLLM (GPTQ/AWQ quantized inference)
pip install moe-policylang[all]      # everything

For quantized models, use the [vllm] extra. vLLM handles GPTQ and AWQ quantization with optimized kernels; MoE-PolicyLang observes routing decisions and applies policy logic without managing the tensors directly.

For Blackwell GPUs (RTX 5080/5090), set these env vars before running vLLM:

export VLLM_USE_FLASHINFER_SAMPLER=0
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
export VLLM_FLASH_ATTN_VERSION=2

From source (development):

git clone https://github.com/jesse-pokora/MoE-PolicyLang.git
cd MoE-PolicyLang
pip install -e ".[dev,gpu]"

Cython fast path (for complex policies):

pip install moe-policylang[cython]
python setup_cython.py build_ext --inplace

Python dispatch ranges from 6 µs/layer (simple LRU) to 47 µs/layer (composed policies with triggers). The Cython path targets the high end: freq_threshold and composed_full drop from 28-47 µs to under 10 µs/layer. Simple policies like lru_basic (6 µs) see no benefit.


Tested Models

MoE-PolicyLang auto-detects MoE structure from any HuggingFace model with no model-specific code required. Evaluated on:

Model Experts x Layers Routing Hardware Backend
Mixtral-8x7B-Instruct 8 x 32 top-2 A100-80 GB HF Transformers
DeepSeek-V2-Lite 64 x 27 top-6 A100-80 GB HF Transformers
Qwen1.5-MoE-A2.7B 60 x 24 top-4 RTX 5080 (16 GB) HF Transformers
Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 60 x 24 top-4 RTX 5080 (16 GB) vLLM
OLMoE-1B-7B 64 x 16 top-8 RTX 5080 (16 GB) HF Transformers

vLLM Integration

MoE-PolicyLang integrates with vLLM for production-grade quantized MoE inference. The VLLMPolicyRunner instruments vLLM's router layers to capture expert routing decisions and feeds them through the policy system. The same DSL, compiler, and hooks work whether the mechanism layer is HuggingFace's eager execution or vLLM's optimized kernels.

from moe_policylang.integrations.vllm_backend import VLLMPolicyRunner

runner = VLLMPolicyRunner(
    model="Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4",
    policy_dsl='''
        policy demo {
            cache { capacity = 8  eviction = lru }
            prefetch { strategy = lookahead  lookahead = 1 }
            schedule { mode = gpu_only }
        }
    ''',
    quantization="gptq",
)

results = runner.generate(["What is expert routing?"], max_tokens=30)
print(results["text"])          # generated text
print(results["policy_stats"])  # cache hits, prefetch accuracy, etc.

Verified on RTX 5080 (16 GB), vLLM 0.21, GPTQ-Int4 quantization. Captures 744 routing events across 24 layers x 60 experts, with a 14.7% cache hit rate and 72% prefetch accuracy from a minimal 8-slot LRU policy.

This shows the DSL is backend-agnostic: the policy specification layer is independent of the inference engine.


Project Structure

moe_policylang/
├── grammar.lark           # Lark LALR grammar (62 productions)
├── parser.py              # Grammar → PolicyIR
├── ir.py                  # Intermediate representation
├── validator.py           # 20 semantic validation rules
├── compiler.py            # IR → CompiledPolicy
├── auto.py                # Auto-generate policies from model + GPU
├── dsl.py                 # Python eDSL (@sched.policy decorator)
├── adaptive.py            # Adaptive policies (adapt blocks)
├── autotuner.py           # Grid-search policy optimizer
├── cli.py                 # CLI: validate, compile, run
├── runtime/
│   ├── hooks.py           # 5-step per-layer dispatch protocol
│   ├── cache.py           # LRU / LFU / Score / FreqThreshold
│   ├── prefetch.py        # Affinity / History / Lookahead
│   ├── scheduler.py       # GPU-only / CPU-fallback / Hybrid
│   ├── per_layer.py       # EPCB — entropy-proportional caching
│   ├── triggers.py        # Memory-pressure & TTL eviction
│   └── _fast/             # Cython-accelerated paths
└── integrations/
    ├── __init__.py         # attach() — main user API
    ├── huggingface.py      # HuggingFace Transformers hooks
    ├── vllm_backend.py     # vLLM integration (routing trace + policy replay)
    ├── weight_placement.py # Expert offloading manager
    └── async_transfer.py   # CUDA stream async transfers

Running Experiments

# Offline trace replay (no GPU needed)
python scripts/run_eval.py
python scripts/run_sweep.py

# Live inference on consumer GPU
python scripts/run_dsl_demo.py
python scripts/run_constrained_e2e.py

# Generate all paper figures
python scripts/generate_figures.py

# Benchmarks & evaluations (requires CUDA GPU + model weights)
python scripts/bench_qwen_multirun.py   # Qwen throughput (Table 4)
python scripts/bench_coldstart.py       # Cold-start throughput analysis
python scripts/bench_power.py           # Power/energy measurement
python scripts/eval_quality.py          # Perplexity evaluation (wikitext-2)
python scripts/ablation_epcb_sensitivity.py  # EPCB hyperparameter sweep
python scripts/plot_coldstart.py        # Generate cold-start figure

Tests

python -m pytest tests/ -q

453+ tests covering parsing, validation, compilation, runtime dispatch, adaptive policies, per-layer EPCB, and integration hooks.


Documentation

See docs/MANUAL.md for the full language reference, runtime API, and policy authoring guide.


Glossary

  • MoE (Mixture-of-Experts). A Transformer layer that replaces a single feed-forward block with N expert networks plus a small router that picks the top-k experts per token.
  • Expert. One feed-forward sub-network inside an MoE layer. Mixtral has 8 per layer, Qwen1.5-MoE has 60, DeepSeek-V2-Lite has 64.
  • Router / top-k routing. The small classifier inside each MoE layer that scores experts and picks the k highest per token.
  • Offloading. Keeping some weights in CPU RAM and moving them to GPU on demand. The cost is the PCIe transfer.
  • PCIe. The bus between CPU memory and the GPU. Roughly two orders of magnitude slower than reading from GPU HBM/GDDR, so cache misses are expensive.
  • Skeleton. Everything in the model that isn't an expert: embeddings, attention, layer norms, LM head. MoE-PolicyLang pins the skeleton on GPU (≈3.7 GB for Qwen1.5-MoE) and only the expert weights move.
  • Cache hit / miss. A hit means the expert the router picked is already on GPU. A miss means we have to fetch it (or run it on CPU, if schedule = cpu_fallback).
  • LRU / LFU. Least-Recently-Used and Least-Frequently-Used cache eviction. LRU drops whatever hasn't been touched lately; LFU drops whatever has the lowest activation count (with a decay factor so old hot experts age out).
  • fp16 / GPTQ / AWQ. Weight precisions. fp16 is the standard half-precision format used in this paper's experiments. GPTQ and AWQ are 4-bit quantization formats that vLLM consumes; they trade a small amount of perplexity for a large VRAM reduction.
  • KV-cache. The cache of attention keys/values from previous tokens during generation. It grows with sequence length and competes with expert weights for VRAM.
  • pp (percentage points). Used for hit-rate deltas: +14.7 pp means 48.6% → 63.3%, not a 14.7% relative change.
  • EPCB. Empirical Per-layer Cache Budgeting. See the section above; the load-bearing part is the per-layer cache structure, not the entropy allocator that gave the technique its name.

Citation

@misc{pokora2026moepolicylang,
  title={MoE-PolicyLang: A Domain-Specific Language for Mixture-of-Experts Scheduling Policies},
  author={Pokora, Jesse},
  year={2026},
  url={https://github.com/jesse-pokora/MoE-PolicyLang}
}

License

MIT License — Copyright (c) 2026 Jesse Pokora

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moe_policylang-1.3.2.tar.gz (100.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moe_policylang-1.3.2-py3-none-any.whl (107.6 kB view details)

Uploaded Python 3

File details

Details for the file moe_policylang-1.3.2.tar.gz.

File metadata

  • Download URL: moe_policylang-1.3.2.tar.gz
  • Upload date:
  • Size: 100.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for moe_policylang-1.3.2.tar.gz
Algorithm Hash digest
SHA256 1fb7e0808b9085a685c70df344c3e5413d66955a31bd03b3b355e00703515994
MD5 5bacc6426e92109c4c2d3c6b96000d54
BLAKE2b-256 73949e38000ada3730803b962536ebdbac8e4934722613e38d6ec92b2b2842a1

See more details on using hashes here.

File details

Details for the file moe_policylang-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: moe_policylang-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 107.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for moe_policylang-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2a33f55023eaba7d07abcfda4ecaa1053f48062616d65e215780b16460dc6456
MD5 e7ff939613d58752d5edcbbd56b845a6
BLAKE2b-256 7b107b0155190dd8267b4afe78e6e609fdbb28fec6ba4a5aad91ba4fb9a5f04e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page