Feed bytes to a transformer with ZERO learned input parameters - the HSL byte-signal substrate replaces the embedding layer (no tokenizer, no embedding table, no learned projection).

These details have not been verified by PyPI

Project links

Project description

hsl-embedding-zero

Feed bytes to a transformer with ZERO learned input parameters.

pip install hsl-embedding-zero

import torch
from hsl_embedding_zero import ZeroInput

door = ZeroInput(K=8, dim=512)        # 0 learned parameters, no tokenizer, no vocab
slots = door(byte_ids)                # [B, L] bytes -> [B, L//8, 512] attention slots
stream = door.stream(byte_ids)        # per-byte path for AR output streams

The idea

Raw bytes fed directly into a transformer are known to fail — that is why learned embeddings exist: something has to lift discrete symbols into a usable geometry.

This package tests the alternative: let a frozen signal representation do that lifting. The HSL substrate (MIT, pip install hsl-embedding) maps every byte to 27 exact channels — change-rate (Gray-code Δ), Δ², boundary, an exact 8-point Fourier transform, and phase — grounded in a lossless codec. If that representation already does the embedding's job, the learned front door should be removable:

bytes → HSL features (frozen 4.6 KB LUT) → fixed zero-pad → transformer

Channels enter unmixed — every feature keeps a fixed address (dim 0–7 is always Δ, 17–24 always Fourier, …). The first learned combination happens inside attention, where it is trainable and inspectable, not at the door where it would be blind.

Measured (the table is the claim)

Same lean decoder body (dim 512 / 8 layers-class), same 3-modality byte mix (text / video windows / audio-caption windows), same fixed 3000-step budget, same seed. Capacity-matched arms via hsl_embedding.ablation. Lower bits/byte = better.

input front door	text bpb	caption bpb	audio→caption binding gap	learned input params
zero (this package)	2.483	1.503	+0.063	0
learned projection on HSL features	2.457	1.329	+0.057	~125k
plain learned byte embedding (standard)	2.848	2.532	+0.080	~132k

It works. With nothing to train at the door, the model trains normally and lands within 1% of a learned input projection — the signal already carries what the learned door would otherwise have to learn.
At equal budget, the standard learned-byte-embedding arm measured 2.848 text bpb; the substrate path reaches 2.483 with zero trained input parameters. We read this not as a victory over embeddings but as a possibility: the lifting that embeddings are trained to do can come from an exact, frozen signal description instead.
Binding gap = extra caption bits/byte when the in-window audio is swapped for a wrong one (cross-modal grounding measure). The zero door preserves it.

Sequence halving holds quality. With K=16 (16 bytes per attention slot — half the prefix positions, attention cost /4 on the input side):

K=16 front door	text bpb	caption bpb	binding gap
zero	2.4815	1.4965	+0.042
learned projection	2.4650	1.5398	+0.031

At K=16 the two doors are interchangeable on every metric (text within 0.7%) — and the zero door takes K up to 18 at dim 512 without adding a single parameter, while a learned projection grows with K. (Trade-off, honestly: binding softens for both doors at K=16 vs K=8 — fine-grained cross-modal alignment prefers smaller slots.)

The point

Didn't we all want this direction — more capability per FLOP and per watt, not less? A byte front end with nothing to train, nothing to store beyond a 4.6 KB table, no tokenizer pass, and sequence density as a free knob is one concrete step that way. This is not a claim that embeddings are beaten; it is a measured demonstration that a possibility now exists, small enough for anyone to verify on one consumer GPU.

Honest limits

Fixed small budget (3000 steps), lean ~25M body, one consumer GPU; seed-0 table (multi-seed run in progress — numbers will be appended, not replaced). A learned embedding may close the gap with a longer schedule. The claim is not "embeddings are obsolete"; it is: on this substrate, the learned front door is measurably unnecessary, and a standard learned byte embedding does not reach the substrate's quality at equal budget. Reproduce or refute: the ablation kit ships in hsl_embedding.ablation (hsl / learned / random / permuted, capacity-matched).

Why this matters

0 learned input parameters — vs ~38M for a GPT-2-class token embedding table.
No tokenizer — any modality that is bytes (text, audio, raster, video windows) goes through the same door; this is the input layer of the byte-native multimodal HoLo line of work (59M, 3-stage curriculum, weights public).
Deterministic & inspectable — the representation cannot drift, leak, or overfit; what enters the model is an exact, invertible signal description.
K is free — packing density (sequence length vs slot width) becomes a pure architecture knob, not a new parameter budget.

API

call	shape	learned params
`ZeroInput(K, dim)(ids)`	`[B, L] → [B, L//K, dim]`	0
`ZeroInput(K, dim).stream(ids)`	`[B, L] → [B, L, dim]`	0
`ZeroInput(K, dim).features(ids)`	`[B, L] → [B, L, 27]` (raw substrate)	0
`zero_input(b"raw bytes")`	one-call convenience	0

Cite

Jinhyun Woo, HoLo: A Feasibility Study of Change-Rate-Based Multimodal Unification — DOI: 10.5281/zenodo.20581805. A release DOI for this package is minted via the GitHub–Zenodo integration (see repository sidebar).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Jun 11, 2026

This version

0.1.1

Jun 11, 2026

0.1.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hsl_embedding_zero-0.1.1.tar.gz (7.2 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hsl_embedding_zero-0.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file hsl_embedding_zero-0.1.1.tar.gz.

File metadata

Download URL: hsl_embedding_zero-0.1.1.tar.gz
Upload date: Jun 11, 2026
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hsl_embedding_zero-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`85c6ed136283a33bb614879e7bc30e619e73705ad952271299ef3cfa2464e5c4`
MD5	`363fc12b5c68d5013752d290bcb32805`
BLAKE2b-256	`98ac140346a110ad2166a9885a49d184bf5268110f3eb9824cc6c3bc0ee9cc17`

See more details on using hashes here.

File details

Details for the file hsl_embedding_zero-0.1.1-py3-none-any.whl.

File metadata

Download URL: hsl_embedding_zero-0.1.1-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 6.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hsl_embedding_zero-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f0da781e93c0f57bd15a69bf6c98df425bc7762fb0d3db3d0cf9c07bf55af86`
MD5	`dae92ac15b76ac46d433802b071d3976`
BLAKE2b-256	`5f2eee87e3263b3fba09c3fce10fac238b44dec42029a0c5596ab5dc1ab97948`

See more details on using hashes here.

hsl-embedding-zero 0.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

hsl-embedding-zero

The idea

Measured (the table is the claim)

The point

Honest limits

Why this matters

API

Cite

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes