Wiola13M: A 12.9M parameter decoder-only language model featuring Gated Spiral Attention, Spiral RoPE, and Butterfly MLP.

These details have not been verified by PyPI

Project links

Project description

Wiola

Gated Spiral Attention — a small language model built for the 10–100M parameter regime
Spiral RoPE · content-adaptive attention gating · Butterfly MLP

Python PyTorch Transformers

Wiola is a decoder-only small language model whose novelty lives entirely in two sub-components of every layer. It is designed to run on a laptop, train on a single consumer GPU in hours, and publish to the Hugging Face Hub — yet be architecturally distinct enough to serve as a real experimental baseline.

Variant	`d`	`L`	`H`	`d_inner`	Params
Nano	256	6	8	512	~12.9M
Micro	384	8	12	768	~40M
Small	512	12	16	1024	~90M

What's novel

Spiral Rotary Positional Encoding. Standard RoPE frequencies are perturbed by a sqrt-growing, per-dimension-pair factor so phase trajectories fan outward instead of staying collinear — improving long-range discrimination at zero added parameters. Setting spiral_alpha=0.0 recovers standard RoPE exactly.
Gated Spiral Attention (GSA). A per-head, content-adaptive scalar gate, derived causally from a cumulative mean of the query projections, modulates attention scores before softmax. Heads that don't help self-suppress — implicit soft head pruning with no sparsity loss. The gate adds 2·H·d_h + H² params (a few hundred for Nano) and is fully KV-cache compatible.
Butterfly MLP. A multiplicative feed-forward block, SiLU(a) ⊙ b, plus an intra-block bypass W_bypass·x. With d_inner = 2d it matches a GeLU 4× FFN in parameter count while providing SwiGLU-class gating and steadier gradients in shallow stacks.

See docs/ARCHITECTURE.md for the full math.

Install

# from source (recommended while pre-1.0)
git clone https://github.com/wiola-project/wiola.git
cd wiola
pip install -e .

# with training / hub extras
pip install -e ".[train,hub]"

From PyPI once published:

pip install wiola

Version note: the model uses the modern transformers Cache API. Pinned to transformers>=4.40,<4.46, the range this release is tested against.

Quickstart

import torch
from wiola13m import WiolaConfig, WiolaForCausalLM

model = WiolaForCausalLM(WiolaConfig())          # Wiola Nano, random init
ids = torch.randint(0, 32000, (1, 16))

out = model(input_ids=ids, labels=ids)           # forward + LM loss
out.loss.backward()                              # gradients flow

model.eval()
gen = model.generate(ids[:, :4], max_new_tokens=20, do_sample=False)

Or run the bundled smoke test:

python scripts/quickstart.py

Train on TinyStories

# 1) get a 32k tokenizer (reuse a LLaMA tokenizer, or train your own)
python examples/create_tokenizer.py reuse --source meta-llama/Llama-2-7b-hf --out ./wiola-tokenizer

# 2) pre-train Nano (~2h/epoch on an RTX 3090)
python examples/train_tinystories.py \
    --tokenizer ./wiola-tokenizer \
    --output-dir ./wiola-nano-tinystories \
    --max-steps 20000

# 3) generate
python examples/generate.py --model ./wiola-nano-tinystories --prompt "Once upon a time"

Publish to the Hugging Face Hub

Wiola ships with auto_map support, so anyone can load your model without installing this package:

huggingface-cli login
python examples/push_to_hub.py --model-dir ./wiola-nano-tinystories --repo-id your-name/wiola-nano

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("your-name/wiola-nano", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("your-name/wiola-nano")

If the wiola package is installed, the "wiola" architecture is auto-registered and you don't even need trust_remote_code=True.

Design decision: gate input

The design doc's figure feeds the gate from post-RoPE queries, while the prose describes it as content-adaptive. Wiola defaults to computing the gate from the pre-RoPE query projections (gate_pre_rope=True) — position-independent and numerically stable — and exposes gate_pre_rope=False to match the figure. Both are causally correct and KV-cache safe.

Tests

pip install -e ".[dev]"
pytest

The suite verifies output shapes, weight tying, strict causality (no future-token leakage), exact equivalence between cached step-by-step decoding and a single full-sequence forward (with and without the gate), save/reload round-trips, and greedy/sampling/batched/beam generation.

Project layout

wiola/
├── src/wiola/
│   ├── configuration_wiola.py   # WiolaConfig
│   ├── modeling_wiola.py        # Spiral RoPE, GSA, Butterfly MLP, decoder, CausalLM
│   └── __init__.py              # Auto* registration
├── examples/                    # train / generate / tokenizer / push_to_hub
├── scripts/quickstart.py
├── tests/
└── docs/ARCHITECTURE.md

Citation

If you use Wiola, please cite it (see CITATION.cff).

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiola13m-1.0.0.tar.gz (24.3 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wiola13m-1.0.0-py3-none-any.whl (20.0 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file wiola13m-1.0.0.tar.gz.

File metadata

Download URL: wiola13m-1.0.0.tar.gz
Upload date: Jul 3, 2026
Size: 24.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for wiola13m-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4c6e43a9d68da5044241d826bbb565208068f022c6d28779742e98b512f34da6`
MD5	`7e4d182bf7030f1a3a355d8351f413a5`
BLAKE2b-256	`9047b56ec999351c945448c25893544896abe278a282b14c786f52b6ef5a948b`

See more details on using hashes here.

File details

Details for the file wiola13m-1.0.0-py3-none-any.whl.

File metadata

Download URL: wiola13m-1.0.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for wiola13m-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9e7677d2ad49377d0d42ea13c01214b84269b45c0065cbb7aebfdc35ecf9e25`
MD5	`4d5e436da6a1753a3fb80cf828780d59`
BLAKE2b-256	`b1ee478610a77707efb38bcf38d53bf52b8c28dad7e6f6c35136c15948e8ec53`

See more details on using hashes here.

wiola13m 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Wiola

What's novel

Install

Quickstart

Train on TinyStories

Publish to the Hugging Face Hub

Design decision: gate input

Tests

Project layout

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes