Byte-level structured token representations for parameter-efficient language models. Reference implementation.

These details have not been verified by PyPI

Project links

Project description

Kronecker Embeddings

Byte-level structured token representations for parameter-efficient language models. Reference implementation.

A deterministic byte-level factorization that replaces nn.Embedding with a fixed codec + a single trainable projection. Parameters scale with the codec output dimension D = char_dim × pos_dim, not with vocabulary size. At the paper's 124M GPT-2 setting (V=50,257, d_model=768, D=4096 via pos_dim=16), the stock nn.Embedding(50,257, 768) has 38.6 M trainable params; the Kronecker projection Linear(4096, 768) has 3.1 M — a ~12.3× input-side reduction (~91% saving), matching paper §6.9 Table 11. The saving grows with vocab and flattens with D, so it grows as you scale up.

Quickstart

pip install kronecker-embeddings

import torch
from transformers import AutoTokenizer
from kronecker_embeddings import KroneckerEmbedding

tok = AutoTokenizer.from_pretrained("gpt2")
emb = KroneckerEmbedding(
    vocab_size=tok.vocab_size,
    d_model=512,
    tokenizer=tok,        # buffers built once at construction
    pos_dim=16,           # D = char_dim * pos_dim = 256 * 16 = 4096 (paper §6.9 setting)
)

ids = tok("hello world", return_tensors="pt").input_ids
out = emb(ids)            # (1, L, 512) — matches nn.Embedding.forward

That's it. Use emb anywhere you'd use nn.Embedding.

What this is

Each token has a UTF-8 byte sequence — "hello" is 5 bytes, "नमस्ते" is 18. The Kronecker codec maps that byte sequence to a fixed D = char_dim × pos_dim dimensional vector deterministically (no learned parameters):

kappa(b) = (1/√L) · vec( Σ_{p=1..L} c_{b_p} ⊗ p_p )

where c_v is the v-th basis vector in R^256 (one-hot for byte value v) and p_p is the p-th basis vector in R^pos_dim (one-hot for byte position p). The result is z-normalized per-token, then passed through a single learned linear projection Linear(D, d_model).

Architecture. Fixed codec + one nn.Linear(D, d_model, bias=False), exactly as described in paper §3. No GELU, no hidden layer, no residual, no norm beyond the codec's z-norm.

Note: paper §3.4 mentions an optional self-entropy regularizer on the projection's row distribution, used in the 138M companion run (paper §6.7). It is a production add-on, not part of the canonical method, and is not shipped in this reference package.

The codec is content-aware by byte: tokens that share UTF-8 byte prefixes share codec output positions, giving the embedding a built-in orthographic structure (the §6.1–6.4 result). Tokenizers that fragment language at the BPE level (Hindi, code, rare languages) get the same byte locality as English at no extra vocab cost.

Why use this

Parameter efficiency. Embedding params scale with D (pos_dim=16 → D=4096 in the paper's 124M experiments; pos_dim=32 → D=8192 in production), not vocab size. The bigger your tokenizer, the bigger the win.
Orthographic structure built in. Spelling variants, plural/singular pairs, and case variants land near each other in input-embedding space by construction. See probes/cross_model_probe.py for the full §6.1–6.4 numbers across 6 public LMs.
Forced-OOV at inference. Any byte sequence — including strings that don't have a single vocab token — produces a valid embedding. See examples/03_inference_with_oov.py.
Drop-in compatibility. Same forward(input_ids) -> (..., d_model) contract as nn.Embedding. State dict round-trips with a single projection.weight tensor; the byte buffers are reconstructed from the tokenizer at load time.
Deployment. No vocab-resize migration when adding/removing tokens — the codec handles arbitrary byte sequences. The byte buffer is ~0.9 MB for a 50K-vocab tokenizer at pos_dim=16, ~4.5 MB for a 131K-vocab tokenizer at pos_dim=32.

Operational variants — `mode="dynamic"` vs `mode="cached"`

Mode	Memory	Forward compute	Use when
`"dynamic"` (default)	`V × (pos_dim + 2)` bytes (~0.9 MB at V=50K, `pos_dim=16`; ~4.5 MB at V=131K production, `pos_dim=32`)	~1-4 ms / micro-batch (scatter_add)	Frontier-scale training or memory-tight inference. The default.
`"cached"`	`V × D × 4` bytes (gigabytes at frontier scale)	~0 (just `index_select`)	Small vocab (V ≲ 50K) or you have RAM to burn.

Both produce bit-identical outputs (verified by tests/test_embedding.py).

Switch via the mode= kwarg:

emb = KroneckerEmbedding(..., mode="cached")

Examples

examples/01_minimal_swap.py — drop-in replacement for nn.Embedding inside a toy encoder.

examples/02_nanogpt_integration.py — patching nanoGPT's transformer.wte to use Kronecker. (A full training fork lives at the forthcoming kronecker-nanogpt repo.)

examples/03_inference_with_oov.py — forced-OOV inference: override a position's bytes with a string that has no single-token equivalent.

Probes

The four probe scripts from the paper:

probes/cross_model_probe.py — §6.1–6.4 embedding-geometry probe over arbitrary HF models. No checkpoint needed.
probes/layered_probe.py — §6.8 layered probe. Requires trained checkpoints (see probes/README.md).
probes/robustness_probe.py — §6.11 110-prompt spelling-robustness probe.
probes/generation_probe.py — §6.12 20-prompt generation probe with forced-OOV substitution.

Reference outputs from the paper's runs are in probes/data/ — reviewers can inspect them directly without re-running.

Citation

@article{shravan2026kronecker,
  title={Kronecker Embeddings: Byte-Level Structured Token Representations
         for Parameter-Efficient Language Models},
  author={Shravan, Rohan},
  journal={arXiv preprint arXiv:2605.29459},
  year={2026},
  url={https://github.com/theschoolofai/kronecker-embeddings}
}

License

Contact

Issues / PRs: github.com/theschoolofai/kronecker-embeddings/issues
Email: rshravan@theschoolofai.in

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 29, 2026

0.1.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kronecker_embeddings-0.1.1.tar.gz (27.9 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kronecker_embeddings-0.1.1-py3-none-any.whl (21.9 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file kronecker_embeddings-0.1.1.tar.gz.

File metadata

Download URL: kronecker_embeddings-0.1.1.tar.gz
Upload date: May 29, 2026
Size: 27.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for kronecker_embeddings-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4cebc169ef10c63a5a6097ac6fe9b839ca0a49be764dda7d7b08d10aa5fe0c0d`
MD5	`c824038776e625f79f3823761bfa4022`
BLAKE2b-256	`28cedc7b342476c0d890a578320a30d47d23b7f64009eac0da56a16d527c0e2d`

See more details on using hashes here.

File details

Details for the file kronecker_embeddings-0.1.1-py3-none-any.whl.

File metadata

Download URL: kronecker_embeddings-0.1.1-py3-none-any.whl
Upload date: May 29, 2026
Size: 21.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for kronecker_embeddings-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21c0f1c40dc1f8fca72b8b570c57f61ea852b6b63dffa500671499e493bd0ef9`
MD5	`15050046d346d51787612fb4eb4e8655`
BLAKE2b-256	`f40d51d3efb7d4b0acc4afbc887a147d2666fb795f3b2f77b73b349491dbca59`

See more details on using hashes here.

kronecker-embeddings 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kronecker Embeddings

Quickstart

What this is

Why use this

Operational variants — `mode="dynamic"` vs `mode="cached"`

Examples

Probes

Citation

See also

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

kronecker-embeddings 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kronecker Embeddings

Quickstart

What this is

Why use this

Operational variants — mode="dynamic" vs mode="cached"

Examples

Probes

Citation

See also

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Operational variants — `mode="dynamic"` vs `mode="cached"`