Skip to main content

YAML-driven modular LLM assembler with Hugging Face compatibility

Project description

EulerStack

A YAML-driven modular LLM assembler with Hugging Face compatibility.

License: Apache 2.0 Python 3.10+

๐ŸŒ Language: English ยท ํ•œ๊ตญ์–ด


EulerStack lets you describe a transformer-family architecture as a YAML spec, validate it against a strict schema, estimate parameters, and compile it to either a JSON runtime config or a standard Hugging Face model directory that you can immediately plug into transformers, PEFT, vLLM, or any downstream training framework.

It is an architecture assembly tool, not a training framework. EulerStack stops at a clean, randomly-initialised, structurally-valid model. From there you bring your own data and your favourite trainer.

Why EulerStack?

Want to try DeepSeek-V3's MLA attention on top of your Llama baseline?

  • Code path: fork modeling_llama.py, rewrite LlamaAttention, patch the KV-cache, re-map the state-dict. ~200โ€“300 lines of diff. The intent โ€” "try MLA" โ€” is one line buried in hundreds.
  • EulerStack path:
    -      attention: { qkv_bias: false }
    +      attention: { qkv_bias: false, latent_dim: 384 }
    
    One line.

That ratio โ€” idea vs mechanical plumbing โ€” is the whole pitch:

  • Changes are tiny. Swap attention for Mamba, add MoE to every 4th layer, enable 2-phase reasoning โ€” each is 1โ€“5 YAML lines, not a refactor.
  • The diff is the design decision. Two months later, you still know what you changed and why. modeling_custom.py diffs lose that intent inside the plumbing.
  • You can discuss it like a blueprint. Reviewers read the spec, not spelunk through PyTorch. Architecture debates happen on a document, not in code comments.
  • Lintable before any GPU. Parameter counts, head-dim sanity, KV-cache budgets โ€” all caught pre-training.
  • Output is vanilla HuggingFace. Plugs into transformers, PEFT, vLLM, etc. No lock-in, no custom runtime.

What's new

Version Headline Detail
0.1.3 (2026-04-30) DeepSeek-V4 schema-complete: 12 new yml fields covering CSA/HCA sparse attention, mHC manifold-constrained hyper-connections, V4 MoE (sqrt_softplus + shared experts + sequence balance + hash routing), MTP head, Muon hybrid optimizer, FP4 QAT. All forward paths wired at BF16 fallback runtime; FP4 indexer kernel reserved for v0.1.4 plugin. release_notes/0.1.3.md
0.1.2 (2026-04-30) 12 new yml fields, all fully wired at runtime (DeepSeek-V3 aux-loss-free MoE + Switch-T router-only LR + Gemma2 regularisers + dilated/NSA attention + layered-safe dtype) plus single-machine multi-GPU DDP pretraining via torchrun --nproc_per_node=N. release_notes/0.1.2.md
0.1.1 (2026-04-25) Infrastructure-only โ€” invariant tests for examplesโ†”tutorials and projects koโ†”en mirror. No yml changes. release_notes/0.1.1.md
0.1.0 (2026-04-13) First public release: schema + IR + HF compile target + 53 presets + i18n CLI (ko/en/zh/ja/es). release_notes/0.1.0.md

The compact change list lives in CHANGELOG.md; the release-notes pages above expand on the why and how to migrate.

Installation

Requires Python 3.10+.

From PyPI (recommended):

pip install eulerstack

From source (for development or the latest main):

git clone https://github.com/<your-org>/eulerstack.git
cd eulerstack
pip install -e .

Either way, the eulerstack CLI is installed on your PATH.

Core runtime dependencies: torch >= 2.1, < 2.10, transformers >= 4.40, pyyaml, click.

Note โ€” PyTorch upper bound: torch 2.10.0+cu128 regressed bf16/fp16 cuBLAS GEMM on Blackwell GPUs (RTX 5090, sm_120). eulerstack pins torch < 2.10 as a defence; torch == 2.8.x is the verified-good release. The cap will be lifted once the upstream fix ships.

Quickstart

The CLI speaks five languages (ko / en / zh / ja / es). The default is Korean; pass --lang en or set EULERSTACK_LANG=en for English.

# See the bundled presets
eulerstack --lang en presets list

# Validate a spec (schema check only)
eulerstack --lang en validate --preset configs/presets/llm_2b_simple.yml

# Validate with a full realism report (param estimates, sanity checks, warnings)
eulerstack --lang en validate --preset my_model.yml --report

# Explain what the spec describes in human-readable form
eulerstack --lang en explain --preset configs/presets/arch_beginner_llama.yml

# Compile to a runtime JSON config
eulerstack --lang en compile --preset my_model.yml --output compiled.json

# Compile to a Hugging Face model directory (config.json + model.safetensors)
eulerstack --lang en compile --preset my_model.yml --output-dir ./my_model_hf

The --output-dir form writes a directory that loads directly with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./my_model_hf")
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # per the spec's tokenizer_contract

Weights are randomly initialised. Training is explicitly out of scope โ€” see Where EulerStack Fits below.

What a Spec Looks Like

A minimal decoder-only model:

schema_version: 1

model:
  name: "my-llm"
  d_model: 2048
  vocab_size: 32000
  max_seq_len: 4096
  n_heads: 16
  mlp_ratio: 4
  dtype: bfloat16

tokenizer_contract:
  type: hf
  pretrained: gpt2

embedding:
  type: learned
  positional: rope
  rope_theta: 500000.0
  tie_word_embeddings: true

layer_templates:
  decoder:
    mixer:
      type: attention        # or: mamba, retnet, hyena, linear_attention, ...
      attention: {}
    ffn:
      type: gated_mlp        # or: moe, mlp
      activation: swiglu
    norm:
      type: rmsnorm
      position: pre

layer_schedule:
  - template: decoder
    repeat: 24

head:
  type: causal_lm
  tie_weights: true

Hybrid and MoE models are expressed the same way โ€” you define multiple layer_templates and arrange them in layer_schedule. See the configs/presets/ directory for working examples, including attention-free models, MoE-every-N-layers, and mixed-mixer stacks.

Architecture

EulerStack is a five-layer pipeline; each layer has one job.

  YAML spec (DSL)
       โ”‚
       โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   validate โ€” schema v1, cross-field checks, realism warnings
  โ”‚  Schema  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   normalize_to_ir โ€” typed, canonical in-memory representation
  โ”‚    IR    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   compile_ir โ€” materialise layer list, param count, runtime config
  โ”‚ Compiler โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ”œโ”€โ”€โ–บ JSON runtime config
       โ”‚
       โ””โ”€โ”€โ–บ Hugging Face model directory (PreTrainedModel + safetensors)

A few details worth knowing:

  • Schema v1 is versioned and strict. Unknown keys are errors (with one exception: reserved prefixes experimental.* / future.* / vendor.*.* are accepted as warnings so plugins and in-progress research can coexist).
  • Mixer types are pluggable: attention (with GQA / sliding-window / RoPE / ALiBi), Mamba / Mamba2, RetNet, Hyena, linear attention, and more. Adding a new mixer means implementing one block class and registering it โ€” no changes to the schema or compiler.
  • FFN types include dense MLP, gated MLP (SwiGLU / GeGLU), and MoE (top-k routing, capacity factor, router z-loss).
  • Outputs are vanilla Hugging Face. There is no EulerStack runtime โ€” the exported directory is indistinguishable from any AutoModelForCausalLM.from_pretrained() target, so all the standard ecosystem tooling (PEFT, LoRA, bitsandbytes, accelerate, DeepSpeed, vLLM, SGLang, llama.cpp converters where applicable) just works.

Presets

configs/presets/ ships with 57 ready-to-compile specs, organised as a four-tier progression from industry-standard canon to v0.1.3's DeepSeek-V4 family.

Tier 1 โ€” Validated industrial

Production-grade baselines. Training recipes are well-studied; failure modes are known.

  • arch_beginner_gpt2, arch_beginner_llama โ€” classic Transformer and Llama-2/3 style
  • arch_intermediate_mistral, arch_intermediate_gemma2, arch_intermediate_qwen_longctx โ€” modern attention patterns
  • llm_0p1b_{simple,mistral} โ€” Stage-1 / CPT warm-up (sovereign-foundation pilot)
  • llm_*_simple, llm_*_mistral across 0.8B / 2B / 4B / 16B

Tier 2 โ€” Recent / complex (hybrid, MoE, KV-compressed)

Modern research consensus running in production systems.

  • arch_advanced_{jamba, samba, retnet} โ€” hybrid and attention-free lines
  • arch_advanced_mla โ€” MLA (DeepSeek-V3 2024, runtime Core)
  • arch_advanced_mod โ€” Mixture-of-Depths (Raposo ICML 2024, runtime Component)
  • arch_expert_* (9 presets, some speculative) โ€” MoE ร— mixer ร— depth explorations
  • arch_expert_*_mini (6 small-scale experts) โ€” ablation-ready for single-GPU
  • llm_*_jamba, llm_*_moe, llm_*_mla across 0.1B / 0.8B / 2B / 4B / 16B (MoE skipped at 0.1B)

Tier 3 โ€” v1 experimental (new advanced architecture features at arch-scale)

Three arch_expert_* presets (~1.2โ€“1.4B) that each showcase one of the advanced architecture features. Schema-complete; runtime partial โ€” the full spec round-trips via config.v1_extensions.

Preset Feature Research basis
arch_expert_reasoning_r1 execution_modes + transition (2-phase think/answer) DeepSeek-R1 (2025), Quiet-STaR (NeurIPS 2024)
arch_expert_titans_memory template.memory (parametric + test-time update) Titans (Google 2024-2025)
arch_expert_dual_stream parallel: monoidal schedule (Mamba โˆฅ Attention) Jamba ร— PaLM generalization

Tier 4 โ€” DeepSeek-V4 family (v0.1.3, BF16 fallback runtime)

Four arch_expert_dsv4_* / arch_expert_mhc_moe_* mini presets that exercise the v0.1.3 W-V4-1 ~ W-V4-6 schema additions. Each forward path is wired at BF16. Two opt-in components ride on top:

  • FP4 indexer kernel โ€” pip install "eulerstack[fp4]" (torchao โ‰ฅ 0.17).
  • Muon optimizer โ€” pip install git+https://github.com/KellerJordan/Muon@main (Muon is not on PyPI and PyPI rejects git-URL direct dependencies in Requires-Dist, so it cannot ship as an extra โ€” install it directly).

Both fall back gracefully when missing.

Preset V4 features What it isolates
arch_expert_dsv4_v3fallback V3 fine-grained MoE + V4 reserved-namespace round-trip V4 spec persists in config.json without changing v0.1.2 training behaviour
arch_expert_mhc_moe_mini mHC residual.type=hyper_connection + V3 MoE mHC residual-stream amplification only
arch_expert_dsv4_subset_mini mHC + V4 MoE (sqrt_softplus, shared, sequence_aux) + MTP head V4 minus CSA/HCA โ€” most tractable V4 subset
arch_expert_dsv4_flash_mini CSA + HCA alternation + mHC + V4 MoE + MTP + Muon hybrid + FP4 QAT DeepSeek-V4-Flash full kitchen-sink at mini scale

Presets are starting points, not the ceiling. EulerStack can assemble models of essentially any size โ€” the schema has no size cap.

Where EulerStack Fits (End-to-End Pipeline)

EulerStack is deliberately a narrow tool: it produces a well-formed, randomly-initialised Hugging Face model. A realistic LLM pipeline looks like this, and EulerStack owns only the first box.

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  EulerStack  โ”‚ -> โ”‚   Pretrain   โ”‚ -> โ”‚ Post-training โ”‚ -> โ”‚  Evaluate  โ”‚ -> โ”‚  Serve โ”‚
  โ”‚  (this tool) โ”‚    โ”‚  your choice โ”‚    โ”‚  SFT / DPO /  โ”‚    โ”‚ your suite โ”‚    โ”‚  your  โ”‚
  โ”‚              โ”‚    โ”‚  of trainer  โ”‚    โ”‚   RLHF / etc. โ”‚    โ”‚            โ”‚    โ”‚  stack โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        ^
   YAML spec in,
  HF model out

Because the output is a standard PreTrainedModel, you can pair EulerStack with any training stack you already trust:

  • Pretraining / continued pretraining: Megatron-LM, NeMo, TorchTitan, Hugging Face Trainer, Composer, Levanter, GPT-NeoX.
  • Fine-tuning (full / LoRA / QLoRA): PEFT, TRL, Axolotl, Unsloth, LLaMA-Factory.
  • Alignment: TRL (DPO / PPO / KTO), OpenRLHF.
  • Serving: vLLM, SGLang, TGI, TensorRT-LLM โ€” any engine that loads transformers checkpoints.

This scope separation is intentional. Training is a fast-moving space with strong, well-maintained tools; EulerStack does not try to re-implement any of them. What it does do is give you a stable, reviewable, reproducible starting point so that every downstream step operates on an architecture whose shape is explicit and auditable.

A typical workflow:

# 1. Design and validate an architecture
eulerstack --lang en validate --preset my_model.yml --report

# 2. Export a Hugging Face model directory (random weights)
eulerstack --lang en compile --preset my_model.yml --output-dir ./my_model_hf

# 3. Hand off to your training stack of choice, e.g. with transformers Trainer:
#    model = AutoModelForCausalLM.from_pretrained("./my_model_hf")
#    trainer = Trainer(model=model, train_dataset=..., ...)
#    trainer.train()

Project Layout

eulerstack/
โ”œโ”€โ”€ eulerstack/          # Python package
โ”‚   โ”œโ”€โ”€ spec/            # Schema, validation, parameter estimation, reports
โ”‚   โ”œโ”€โ”€ ir/              # Typed intermediate representation + normalizer
โ”‚   โ”œโ”€โ”€ compiler/        # IR -> runtime config / HF model directory
โ”‚   โ”œโ”€โ”€ components/      # Attention, Mamba, RetNet, Hyena, MoE, norms, ...
โ”‚   โ”œโ”€โ”€ blocks/          # Layer templates composed from components
โ”‚   โ”œโ”€โ”€ assembler/       # Layer-schedule materialisation
โ”‚   โ”œโ”€โ”€ hf/              # Hugging Face export (config.json, safetensors)
โ”‚   โ”œโ”€โ”€ cli/             # `eulerstack` command
โ”‚   โ””โ”€โ”€ i18n/            # 5-language CLI message catalog
โ”œโ”€โ”€ configs/presets/     # 57 ready-to-compile YAML specs
โ”œโ”€โ”€ examples/            # Runnable scripts (compile โ†’ export โ†’ load โ†’ generate)
โ”œโ”€โ”€ tests/               # Unit + smoke tests
โ””โ”€โ”€ pyproject.toml

Tutorials

Full, searchable online tutorials are published at:

๐ŸŒ eulerwa.com/en/products/eulerstack/tutorials/

The offline copy under docs/tutorials/en/ mirrors the site and is the best place to start if you prefer to read locally. Key entry points:

Examples

See examples/ for runnable scripts. They mirror the end-to-end flow the tutorials walk through:

  • 01_compile_and_export.py โ€” compile a preset and save as an HF model directory. (Covered in tutorial 04.)
  • 02_load_and_generate.py โ€” load the exported model with transformers and generate. (Covered in tutorial 04.)
  • 03_architecture_evolution.py โ€” compare several architectures side by side. (Driven from tutorial 07.)
  • 04_pretrain_tokenize.py โ€” pre-tokenize a jsonl corpus into a packed int32 cache. (Tutorial 12 ยง2.)
  • 05_pretrain_train.py โ€” universal trainer (auto-detects EulerStack / HF MoE / dense); full MoE telemetry in logs. (Tutorial 12 ยง3.)

Reference Projects

Once you have finished the tutorials and the toy examples above, projects/ ships an actually-runnable end-to-end pipeline so you can move from "compiled a spec" to "trained it and watched the loss fall". run_all.sh drives the whole pipeline from compile through train.

projects/01_arch_complex_rtx3090 โ€” every advanced/expert preset, sanity-trained

  • 22 presets (5 ร— arch_advanced_* + 13 ร— arch_expert_* + 4 ร— DeepSeek-V4 family added in v0.1.3) all scaled to ~500 M params.
  • Fits on a single RTX 3090 (24 GB) โ€” bs=4, seq=512, bf16, 10 M target-tokens per config.
  • bash run_all.sh compiles and trains the whole batch; bash run_all_v4.sh restricts to the V4 family for a quick smoke pass.
  • Designed around cuda:1 by default; override with DEVICE=cuda:0 (or whatever your setup has).
  • Proves every v1 primitive (MLA / Mamba / RetNet / Hyena / MoE / Titans / MoD / ODE / execution_modes / CSA / HCA / mHC / MTP / Muon hybrid) does forward + backward + save on affordable hardware.

The reference bench above assumes you have worked through the core tutorials first โ€” it is a sanity-train pipeline, not finished research. The point is that every spec EulerStack can compile actually trains on hardware you can afford.

projects/02_moe_pretrain_study โ€” three production MoE designs head-to-head

Mixtral (4ร—top-2) / DeepSeek (8ร—top-2) / BlackMamba (Mamba+4ร—top-1) trained on the same 4 B FineWeb-Edu token stream at ~200 M params on RTX 5090. Reports a clean negative finding: at 4 B tokens / 200 M, router entropy stays at the ceiling regardless of MoE design โ€” the differentiation threshold is higher than this. Empirical motivation for the larger-budget follow-ups.

projects/03_moe_frontier_study โ€” frontier MoE at 500 M / 50 B token budget

Targets the published-MoE-differentiation regime (โ‰ˆ5ร— Chinchilla, 500 M params / 50 B tokens) on RTX 5090, contrasting 8-expert vs 16-expert configurations to test whether router entropy finally separates beyond 02's negative finding.

projects/04_cortical_heterogeneous_study โ€” heterogeneous cortical layer schedule

Mixes V3 fine-grained MoE, MLA attention, Titans memory, and MoD depth gating across distinct cortical-inspired layer roles (sensory / association / output) to study whether per-role architectural specialisation improves loss at fixed param budget.

projects/05_atlas_helix_study โ€” DeepSeek-V4 + Titans memory ablation (v0.1.3)

Forward-looking presets that exercise the v0.1.3 V4 stack: dsv4_cortical_* (mHC + V4 MoE in association layers), dsv4_titans_memory_* (MTP head + Titans memory), csa_mod_* (CSA + Mixture-of-Depths). Configs sit under the project โ€” not promoted to configs/presets/ until validated.

Coming soon

  • Multi-node DDP follow-up (W-DDP-2, deferred from v0.1.2 single-machine).
  • W-V4-FP4 plugin (Tilelang FP4 indexer kernel) โ€” current v0.1.3 ships the BF16 fallback path.

Testing

python -m pytest tests/ -v

The unit suite covers schema validation, IR normalisation, compilation, parameter estimation, report generation, and CLI behaviour for every bundled preset.

Contributing

Contributions are welcome. Please open an issue to discuss substantial changes (new mixer types, schema changes, new presets) before sending a PR. For small fixes or clarifications, a PR is fine on its own.

When adding a new component (e.g. a new mixer), the rough checklist is:

  1. Implement the block under eulerstack/components/ or eulerstack/blocks/.
  2. Register it so the schema accepts it.
  3. Add a minimal preset in configs/presets/ that exercises it.
  4. Add tests alongside the existing suite in tests/.

License

Licensed under the Apache License, Version 2.0. See LICENSE for the full text.

Copyright ยฉ 2026 Eulerwa Inc.

Contact

Eulerwa Inc. ๐ŸŒ Website: eulerwa.com ๐Ÿ“š Tutorials: eulerwa.com/en/products/eulerstack/tutorials/ ๐Ÿ“ง Tech contact: tech@eulerwa.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eulerstack-0.1.3.tar.gz (271.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eulerstack-0.1.3-py3-none-any.whl (188.1 kB view details)

Uploaded Python 3

File details

Details for the file eulerstack-0.1.3.tar.gz.

File metadata

  • Download URL: eulerstack-0.1.3.tar.gz
  • Upload date:
  • Size: 271.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for eulerstack-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f5ac2bfeacec7d950cde75a0909c605fd698e5956adec80bce3b1b3dfa0144ee
MD5 be9b188588d20e2d5be9be9e28c60841
BLAKE2b-256 458853dbcef872366c200179a8a949fb61835312256c2c71cae63a64772f9a93

See more details on using hashes here.

File details

Details for the file eulerstack-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: eulerstack-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 188.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for eulerstack-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6d5c5fa3b323a5892c98b66b0b33367afeec51bbb7d9252f9198575624c24a1f
MD5 cd730d1186fbabc6822e9a0877b6789b
BLAKE2b-256 79705550859b9ffde5b38aa7c4d7a35707f2c1627d4ded45d620538cfcf4a870

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page