HSL (Holistic Signal Language): a non-learned, byte-level signal encoder for PyTorch — change-rate features, no tokenizer, losslessly invertible.

These details have not been verified by PyPI

Project links

Project description

HSL — Holistic Signal Language

A non-learned, byte-level signal encoder for PyTorch. Instead of splitting text into tokens, it reads raw bytes holistically as signal: bits, change-rate (Δ, XOR-delta), 2nd-order change (Δ²), boundary, Fourier bands, and exact complex phase — 29 dimensions per byte, losslessly invertible. One modality-agnostic input layer for text, image, audio, video — any byte stream.

Everything is information — a fluctuation between 0 and 1. HSL doesn't ask what a token means; it measures how the signal changes, with exact formulas, so the same representation works under every modality.

import hsl_embedding as hsl

feats, phase = hsl.embed(b"hello")          # -> Tensor [L, 21], Tensor [L]
emb = hsl.Embedding()                        # an nn.Module, no parameters (like nn.Embedding)
feats = emb("강아지".encode())               # -> [L, 21]
assert hsl.decode(hsl.encode(b"hello")) == b"hello"   # lossless, by construction

Install

pip install hsl-embedding      # distribution name; import as `import hsl_embedding as hsl`
# deps: numpy, torch

Why not just `nn.Embedding`?

They solve different problems — this is not a performance claim, it's a "when to use which".

	`torch.nn.Embedding`	`hsl.Embedding`
what it is	a learned lookup table (trainable params)	an exact formula (zero params, deterministic)
input	a token id (`int`)	raw `bytes`
needs	a tokenizer + fixed vocab + training data	nothing — works on any bytes, day one
dimensions	opaque, learned	named & interpretable (Δ / Δ² / boundary / Fourier / phase)
modality	one tokenizer per modality (text ≠ image ≠ audio)	one substrate for all (byte-native)
invertible	no	yes (`decode(encode(x)) == x`)
new scripts / formats	breaks / out-of-vocab	just bytes — never breaks

They compose. HSL is an input substrate, not a replacement for learned representations: nn.Embedding learns what tokens mean; HSL gives exact structural signal for free. Stack learned layers on top of HSL features.

Reach for HSL when you want: tokenizer-free input · one model across modalities · structure/change-aware features · exact reconstruction · small-data or from-scratch training · interpretable input channels.

What each channel captures (and where it's good)

HSL is built from exact formulas, each chosen to carry information a plain learned embedding tends to throw away. The default is 21-D — the pure change-rate substrate, one row per channel:

channel (dims)	exact formula	captures	especially good for
Δ `dxor` 0–7 (8)	`XOR(bitₜ, bitₜ₋₁)` from origin 0	change / transitions — where the signal flips	edges, topic/region shifts, the modality-shared "rate of change". Measured: shift-detection AUC 0.725* vs content 0.698.*
Δ² `d2xor` 0–7 (8)	`XOR(Δₜ, Δₜ₋₁)`	acceleration of change (2nd order) — 편미분 경계	sharp boundaries / corners / onsets; where the rate-of-change itself jumps (segment cuts, audio attacks, image corners)
boundary (1)	`\|Δ\| + 0.5\|Δ²\| + 0.25·HF`	transition-energy peaks	tokenizer-free segmentation — natural byte/word/chunk cuts without decoding
Fourier low/high (2)	per-byte 8-bit rFFT amplitude bands	frequency / texture / periodicity	smooth vs busy, periodic vs random — audio timbre, image texture, repetitive vs novel content
phase cos/sin (2)	exact phasor `z = e^{iθ}, θ = 2π·byte/256`	cyclic relation / angle — exact `cos(θᵢ−θⱼ)`	affect / mood and relative/positional structure. Measured: phase-variation tracks the audio affect-line 0.912, better than loudness alone.

The point: a single learned vector blurs all of this together. HSL keeps change (Δ), curvature (Δ²), spectrum (Fourier), and phase as separate, exact, interpretable channels — and adds them only where a modality needs them.

Legacy 29-D: include_bits=True prepends the 8 raw byte bits. They're redundant (Δ-from-origin-0 already encodes the bytes losslessly), included only to match the original trained HoLo model.

Lossless by construction

The features are grounded in a lossless codec, so the substrate is byte-exact:

frame = hsl.encode(b"any bytes \x00\xff")
hsl.decode(frame) == b"any bytes \x00\xff"     # True

Δ-from-origin-0 is the codec's XOR-delta, so it already encodes the bytes losslessly — which is why the raw bits channel is redundant and can be dropped.

21-D (default) vs 29-D (legacy)

hsl.embed(data)                      # 21-D  (default; pure change-rate, no redundant bits)
hsl.embed(data, include_bits=True)   # 29-D  (also prepend the 8 raw bits — original HoLo model)
hsl.Embedding(include_bits=True).out_dim   # 29

Batch

emb = hsl.Embedding()
feats, phase, mask = emb.pack([b"a", b"abcdef"], max_len=8)   # [B, L, D], [B, L], [B, L]

Examples

python examples/quickstart.py        # bytes in, features out; named channels
python examples/roundtrip_all.py     # text / image / audio / video -> embed -> EXACT reconstruction
python examples/vs_nn_embedding.py   # nn.Embedding vs hsl.Embedding — when to use which
python examples/benchmark_vs_nn.py   # honest capability + speed comparison

roundtrip_all.py — one modality-agnostic encoder, lossless by construction:

modality              bytes     feat shape   reconstruction
----------------------------------------------------------------
text  (utf-8)            98       (98, 21)   EXACT ✓
image (RGB u8)         3072     (3072, 21)   EXACT ✓
audio (PCM i16)        8000     (8000, 21)   EXACT ✓
video (6 frames)       4608     (4608, 21)   EXACT ✓

Scope (honest)

HSL is a non-learned input substrate — a possibility-proof from an independent, single-GPU project, not a benchmark-beating system. It gives exact structural signal; the meaning still comes from a model you stack on top. See the paper and live demo:

📄 Paper: A Feasibility Study of Change-Rate-Based Multimodal Unification (Zenodo)
🌐 Live demo: https://holo-demo-p5txmh4dda-as.a.run.app
💻 HoLo project: https://github.com/Woojiggun/holo-hsl

License & citation

MIT License — © 2026 Jinhyun Woo (ggunio5782@gmail.com). Free to use, modify, and distribute, including for commercial use — the only condition is that the copyright notice and attribution to Jinhyun Woo are kept. See LICENSE.

@software{woo_hsl_2026,
  author = {Jinhyun Woo},
  title  = {HSL: a byte-native, modality-agnostic signal embedding},
  year   = {2026},
  doi    = {10.5281/zenodo.20581805},
  url    = {https://github.com/Woojiggun/holo-hsl}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.1

Jun 10, 2026

0.5.0

Jun 10, 2026

0.4.0

Jun 10, 2026

0.3.0

Jun 9, 2026

0.2.0

Jun 9, 2026

0.1.1

Jun 8, 2026

This version

0.1.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hsl_embedding-0.1.0.tar.gz (12.4 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hsl_embedding-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file hsl_embedding-0.1.0.tar.gz.

File metadata

Download URL: hsl_embedding-0.1.0.tar.gz
Upload date: Jun 8, 2026
Size: 12.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hsl_embedding-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2fa1658a42e27f2296c4e49c8f06fc1aaf7235b514fcc8c1b67a9bad1c381de7`
MD5	`e646755756f399132f7aed6d43ba7788`
BLAKE2b-256	`64ea5e24726fd3da46baf220362a5b7831f0926a070258771955eac2d89531ab`

See more details on using hashes here.

File details

Details for the file hsl_embedding-0.1.0-py3-none-any.whl.

File metadata

Download URL: hsl_embedding-0.1.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 8.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hsl_embedding-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa3b48afc0264ce347813c011987c488bf32a6538f28bd8af8648a1f7395ec5d`
MD5	`3bd1d810e545bafad06ea8de6f96124c`
BLAKE2b-256	`851247d53e26c72ecf34af3712222afcd51862cc8b09c7faa8c1fca8ed74f467`

See more details on using hashes here.

hsl-embedding 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

HSL — Holistic Signal Language

Install

Why not just `nn.Embedding`?

What each channel captures (and where it's good)

Lossless by construction

21-D (default) vs 29-D (legacy)

Batch

Examples

Scope (honest)

License & citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

hsl-embedding 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

HSL — Holistic Signal Language

Install

Why not just nn.Embedding?

What each channel captures (and where it's good)

Lossless by construction

21-D (default) vs 29-D (legacy)

Batch

Examples

Scope (honest)

License & citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Why not just `nn.Embedding`?