Ultra-fast 2D and 3D Hilbert space-filling curve encode/decode kernels (NumPy/Numba + PyTorch/Triton).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

remcofl

These details have not been verified by PyPI

Project description

HilbertSFC

Ultra-fast 2D & 3D Hilbert space-filling curve encode/decode kernels for Python.

2D Hilbert curves for nbits 1..5 3D Hilbert curves animation grid for nbits 1..4

_{2D Hilbert curves (nbits 1..5) and 3D Hilbert curves (nbits 1..4, animated).}

✨ New in v0.3.0: PyTorch API + GPU-accelerated kernels with Triton!
New in v0.4.0: Morton/z-order curves

This library is performance-first and implemented entirely in Python. It provides fast Hilbert encode/decode kernels for both CPU and GPU, with convenient high-level APIs for NumPy and PyTorch, low-level kernel accessors and clean integration with torch.compile for fusion with surrounding code. For completeness, it also includes Morton/z-order curve kernels.

The hot kernels are JIT-compiled with Numba (CPU) and Triton (GPU) and tuned for:

Branchless, fully unrolled inner loops
Small, L1-cache-friendly lookup tables (LUTs)
Reduced dependency chains for better ILP and MLP (e.g. state-independent lookups)
Multi-threading for batch processing
SIMD via LLVM vector intrinsics (CPU)
Reduced register pressure (GPU)

Performance

CPU - Numba

HilbertSFC is orders of magnitude faster than existing Python implementations. It also outperforms the Fast Hilbert Rust crate by a factor of ~8x. In fact, HilbertSFC takes only ~6 CPU cycles per point for 2D encode/decode of 32-bit coordinates.

2D Points - Random, `nbits=32`, `size=5,000,000`

Implementation	ns/pt (enc)	ns/pt (dec)	Mpts/s (enc)	Mpts/s (dec)
hilbertsfc (multi-threaded)	0.41	0.48	2410.39	2084.98
hilbertsfc (Python)	1.38	1.59	726.68	629.52
fast_hilbert (Rust)	12.24	12.03	81.67	83.11
hilbert_2d (Rust)	121.23	101.34	8.25	9.87
hilbert-bytes (Python)	2997.51	2642.86	0.334	0.378
numpy-hilbert-curve (Python)	7606.88	5075.58	0.131	0.197
hilbertcurve (Python)	14355.76	10411.20	0.0697	0.0961

System info: Intel Core Ultra 7 258v, Ubuntu 24.04.4, Python 3.12.12, Numba 0.63.1

Additional benchmarks and details are available in the benchmark-cpu.md.

For a deep dive into how the HilbertSFC kernels are derived and why the implementation maps well to modern CPUs (FSM/LUT formulation, dependency chains, ILP/MLP, unrolling, constant folding, vectorization, gathers), see the performance deep dive notebook.

GPU (CUDA/ROCm) - Torch/Triton

HilbertSFC achieves very high throughput on modern GPUs, reaching up to ~143 billion points per second for 3D encode of 32-bit coordinates (nbits=21) on an NVIDIA Blackwell B200. At size=64Mi, compared to an eager PyTorch implementation of the Skilling algorithm, it is roughly 3100× faster for 3D encode and 2300× faster for 3D decode.

2D and 3D Points - Random, `nbits=32` (2D), `nbits=21` (3D), `size=64Mi (2^26)`, throughput in `Mpts/s`

Implementation	Mode	2D enc	2D dec	3D enc	3D dec
HilbertSFC	triton	225234	238367	143405	147926
HilbertSFC	eager	5668	5324	2745	2886
Skilling (Pointcept)	eager	37.9	48.4	46.4	63.1

System info: NVIDIA Blackwell B200, Ubuntu 24.04.4, Python 3.12.3, PyTorch 2.11.0, CUDA 13.0, Triton 3.6.0

PyTorch CUDA 3D encode and decode throughput comparison
_{Throughput comparison for 3D Hilbert encode/decode on B200 (`nbits=21`).}

See benchmark-gpu.md for more details and additional GPU benchmarks.

Quickstart

Installation

Install the base package with either pip or uv:

With pip:

pip install hilbertsfc

Or with uv:

uv add hilbertsfc

PyTorch support

To enable the optional PyTorch extension, install with the torch extra:

pip install hilbertsfc[torch]

[!NOTE]

By default, installing hilbertsfc[torch] pulls in a platform-default PyTorch build:

Windows: CPU-only

Linux: CUDA-enabled

If you need a specific PyTorch, CUDA, or ROCm version, follow the official PyTorch installation instructions. Then install hilbertsfc[torch] as shown above.

Usage

Space-filling curves such as Hilbert and Morton map multi-dimensional integer coordinates onto a single scalar index while preserving spatial locality. hilbertsfc provides simple encode/decode APIs for Python scalars, NumPy arrays, and PyTorch tensors.

Python scalars

Use hilbert_encode_2d and hilbert_decode_2d directly on Python integers:

from hilbertsfc import hilbert_decode_2d, hilbert_encode_2d

index = hilbert_encode_2d(17, 23, nbits=10)
x, y = hilbert_decode_2d(index, nbits=10)

nbits controls the coordinate domain [0, 2**nbits) on each axis. It is optional, but when you know the coordinate range ahead of time, passing a tighter value can improve performance and reduce output dtypes.

The 3D API follows the same pattern via hilbert_encode_3d and hilbert_decode_3d. The Morton/z-order functions mirror these names with morton_encode_2d, morton_decode_2d, morton_encode_3d, and morton_decode_3d.

NumPy arrays

The same functions also accept NumPy integer arrays, preserving shape and supporting batch encode/decode efficiently.

import numpy as np
from hilbertsfc import hilbert_encode_2d

xs = np.array([0, 1, 2, 3], dtype=np.uint32)
ys = np.array([3, 2, 1, 0], dtype=np.uint32)

indices = hilbert_encode_2d(xs, ys, nbits=2)

This is the preferred use for high-throughput workloads on CPU. It can be further accelerated with parallel=True.

PyTorch tensors

The hilbertsfc.torch frontend works with PyTorch tensors on CPU and accelerator devices. On CUDA/ROCm, contiguous tensors take the Triton path when available; otherwise execution falls back to the Torch backend.

import torch
from hilbertsfc.torch import hilbert_decode_2d, hilbert_encode_2d

device = "cuda" if torch.cuda.is_available() else "cpu"
nbits = 10

xs = torch.randint(0, 2**nbits, (4096,), dtype=torch.int32, device=device)
ys = torch.randint(0, 2**nbits, (4096,), dtype=torch.int32, device=device)

indices = hilbert_encode_2d(xs, ys, nbits=nbits)
xs2, ys2 = hilbert_decode_2d(indices, nbits=nbits)

HilbertSFC also supports torch.compile. Before entering a compiled region, call precache_compile_luts(...) so LUT materialization happens outside the compiled graph.

Learn more

For more details and advanced usage, see:

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

remcofl

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Apr 25, 2026

0.3.0

Apr 21, 2026

0.2.0

Mar 18, 2026

0.1.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hilbertsfc-0.4.0.tar.gz (316.5 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hilbertsfc-0.4.0-py3-none-any.whl (357.4 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file hilbertsfc-0.4.0.tar.gz.

File metadata

Download URL: hilbertsfc-0.4.0.tar.gz
Upload date: Apr 25, 2026
Size: 316.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hilbertsfc-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`0b90923a9062e3e75afeeb0572cf8f6468fe5860128c923e1dc9a49857efce39`
MD5	`2e118fcd6cb6104ae8e0651970fb350e`
BLAKE2b-256	`e735e69cbae0a68e252a6b340aee52bfca6a13b34b3e07e8d3f3a0e1ec1c8272`

See more details on using hashes here.

File details

Details for the file hilbertsfc-0.4.0-py3-none-any.whl.

File metadata

Download URL: hilbertsfc-0.4.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 357.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hilbertsfc-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0029465b3ef58e2ad1d006c0881f9b02d7d56fe6a8ca4add69c87f90a9db769b`
MD5	`389d058b8a71a39a652b0e87c9eb0e1f`
BLAKE2b-256	`6e4e81dc8be6becbf149d69b859f69325a502a11d66f0a4c46f76bd94e881c83`

See more details on using hashes here.

hilbertsfc 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

HilbertSFC

Performance

CPU - Numba

2D Points - Random, nbits=32, size=5,000,000

GPU (CUDA/ROCm) - Torch/Triton

2D and 3D Points - Random, nbits=32 (2D), nbits=21 (3D), size=64Mi (2^26), throughput in Mpts/s

Quickstart

Installation

With pip:

Or with uv:

PyTorch support

Usage

Python scalars

NumPy arrays

PyTorch tensors

Learn more

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2D Points - Random, `nbits=32`, `size=5,000,000`

2D and 3D Points - Random, `nbits=32` (2D), `nbits=21` (3D), `size=64Mi (2^26)`, throughput in `Mpts/s`