Skip to main content

Portable mixed-precision math, linear-algebra, & retrieval library with 2000+ SIMD kernels for x86, Arm, RISC-V, LoongArch, Power, & WebAssembly

Project description

NumKong for Python

NumKong for Python is the main high-level SDK in the project. It targets the gap between numpy and low-level native kernels: you keep buffer-protocol interoperability and shape-aware outputs, but you stop giving up mixed precision, widened accumulators, packed reuse, and backend-specific optimizations every time you leave float64. It combines NumPy-friendly buffers with native mixed-precision kernels, zero-copy tensor views, packed and symmetric matrix operations, sparse helpers, geometric mesh alignment, and MaxSim. The API feels NumPy-shaped with familiar scalar, batched, and all-pairs entrypoints, while Tensor keeps shape, dtype, and strides visible through a memoryview-backed container. Low-precision dtypes (BFloat16, Float8, Float6, packed bits) flow through the same API, and dense, packed, and symmetric kernels release the GIL around native work.

Ecosystem Comparison

Feature NumKong NumPy/SciPy PyTorch
Operation families dots, distances, binary, probability, geospatial, curved, mesh, sparse, MaxSim, elementwise, reductions, cast, trig dots, distances, elementwise, reductions, some probability via cdist dots, distances, elementwise, reductions
Precision BFloat16 through sub-byte — Float8, Float6, Int4, packed bits; automatic widening; Kahan summation; 0 ULP in Float32/Float64 Float16, partial BFloat16; no auto-widening; standard accuracy Float16, BFloat16, partial Float8; explicit AMP required; standard accuracy
Runtime SIMD dispatch auto-selects best ISA per-thread at runtime on x86, ARM, RISC-V compile-time only CPU: compile-time; CUDA: runtime
Packed matrix, GEMM-like pack once, reuse across query batches np.dot/@ — no persistent packing torch.mm — no persistent distance-oriented packing
Symmetric kernels, SYRK-like skip duplicate pairs, up to 2x speedup for self-distance pdist computes one triangle; cdist recomputes both X @ X.T recomputes both triangles
Output parameter out= Yes — all major entrypoints Yes — most ufuncs and functions; SciPy: some functions only Yes for torch.mm, torch.matmul; No for torch.cdist
Fast CPython calling convention Yes — direct METH_FASTCALL Yes — vectorcall in 2.0+ No — tensor dispatch overhead
GIL release batched, packed, and symmetric kernels some ops only most ops

Quickstart

import numpy as np
import numkong as nk

a, b = np.random.randn(1536).astype(np.float32), np.random.randn(1536).astype(np.float32)
dot = nk.dot(a, b)  # widened accumulation, not same-dtype
print(dot)

Installation

From PyPI:

python -m pip install numkong

From a local checkout:

python -m pip install .

Quick runtime check:

python -c "import numkong as nk; print(nk.get_capabilities())"

Wheel Compatibility and Building from Source

Pre-built wheels are available on PyPI for Linux (x86_64, aarch64, riscv64, plus i686, ppc64le, s390x), macOS (x86_64, arm64), and Windows (AMD64, ARM64). Python 3.9 through 3.14 is supported, including free-threading variants (3.13t, 3.14t). Every wheel is built with NK_DYNAMIC_DISPATCH=1, so a single wheel covers all CPU generations on a given architecture.

When building from source, the compiler requirements depend on the platform. On macOS x86 only AVX2 is available; on macOS ARM NEON is always present, but SME requires Apple M4+ with Xcode 16+ (AppleClang 16+). RISC-V builds require Clang and LLD because GCC lacks zvfh, zvfbfwma, and zvbb support. On Windows, MSVC 19.44+ (Visual Studio 2022 17.14+) is recommended for full AVX-512 with FP16/BF16/VNNI. Build parallelism is controlled by NK_BUILD_PARALLEL, which defaults to min(cpu_count, 4) and should be lowered in memory-constrained containers. There is no OpenMP dependency. Python-side parallelism uses concurrent.futures with GIL-free kernels.

NK_BUILD_PARALLEL=2 pip install . --no-build-isolation

Dot Products

Dot products are their own family because storage type, conjugation rules, and output widening matter.

import numpy as np
import numkong as nk

a = (np.random.randn(256) + 1j * np.random.randn(256)).astype(np.complex64)
b = (np.random.randn(256) + 1j * np.random.randn(256)).astype(np.complex64)

dot = nk.dot(a, b)   # numpy.dot(a, b)
vdot = nk.vdot(a, b) # numpy.vdot(a, b)

print(dot, vdot)

Real low-precision inputs can also be routed through explicit dtype tags when the storage buffer itself is raw bytes.

Dense Distances

The dense distance entrypoints cover sqeuclidean, euclidean, and angular. The first important difference from NumPy or SciPy is that the accumulator policy is not forced to match the storage dtype.

import numpy as np
import numkong as nk

a = np.random.randn(768).astype(np.float16)
b = np.random.randn(768).astype(np.float16)

sqeuclidean = nk.sqeuclidean(a, b)
euclidean = nk.euclidean(a, b)
angular = nk.angular(a, b)

For float16, a naive same-dtype implementation is exactly the kind of path that loses precision or widens too late. NumKong's API makes the widening policy part of the kernel contract.

Output Control: out=, dtype=, and out_dtype=

Most distance and dot-product entrypoints accept out=, dtype=, and out_dtype= keyword arguments. Passing them avoids dynamic memory allocations for temporary objects.

import numpy as np
import numkong as nk

queries = np.random.randn(100, 768).astype(np.float32)
database = np.random.randn(100, 768).astype(np.float32)

# Pre-allocated output with out=
out = nk.zeros((100,), dtype="float32")
nk.sqeuclidean(queries, database[:100], out=out)  # writes in-place, returns None

# Explicit input dtype for raw byte buffers
raw = np.frombuffer(some_bytes, dtype=np.uint16)
nk.dot(raw, raw, dtype=nk.bfloat16)  # reinterpret uint16 as bf16

# Output dtype override
nk.euclidean(queries[0], database[0], out_dtype="float32")  # accumulate in f64, downcast result

When out= is provided, the function writes results in-place and returns None. The out array must be pre-allocated with the correct shape and a supported dtype. For custom float types (bfloat16, float16, float8_e4m3, float8_e5m2, float6_e2m3, float6_e3m2), type objects are preferred over strings — they are faster to dispatch and provide IDE autocomplete:

nk.dot(a, b, dtype=nk.bfloat16) # works faster
nk.dot(a, b, dtype="bfloat16")  # works a bit slower

Set Similarity

Packed-binary metrics operate on packed bits. That is why the right NumPy equivalent uses np.packbits, not bool arrays fed to scalar Python code.

import numpy as np
import numkong as nk

a_bits = np.random.randint(0, 2, size=256, dtype=np.uint8)
b_bits = np.random.randint(0, 2, size=256, dtype=np.uint8)
a, b = np.packbits(a_bits), np.packbits(b_bits)

hamming = nk.hamming(a, b, dtype="uint1")
jaccard = nk.jaccard(a, b, dtype="uint1")

Integer set Jaccard works on sorted ascending arrays of integer identifiers. Both inputs must be sorted in ascending order for correct results.

set_a = np.array([1, 3, 5, 7, 9], dtype=np.uint32)  # must be sorted ascending
set_b = np.array([3, 5, 8, 9, 10], dtype=np.uint32)  # must be sorted ascending
jaccard_sets = nk.jaccard(set_a, set_b) # |A ∩ B| / |A ∪ B|
assert 0.0 < jaccard_sets < 1.0, "|A ∩ B| / |A ∪ B| should be in (0, 1)"

Probability Metrics

Probability divergences deserve their own section because they are not just "one more distance".

import numpy as np
import numkong as nk

p = np.array([0.2, 0.3, 0.5], dtype=np.float32)
q = np.array([0.1, 0.3, 0.6], dtype=np.float32)

kl_forward, kl_reverse = nk.kullbackleibler(p, q), nk.kullbackleibler(q, p)
assert kl_forward != kl_reverse, "KLD is asymmetric"

js_forward, js_reverse = nk.jensenshannon(p, q), nk.jensenshannon(q, p)
np.testing.assert_allclose(js_forward, js_reverse, atol=1e-6)  # JSD is symmetric

Geospatial Metrics

Geospatial kernels take four coordinate arrays. Inputs are in radians. Outputs are in meters.

import numpy as np
import numkong as nk

# Statue of Liberty (40.6892°N, 74.0445°W) → Big Ben (51.5007°N, 0.1246°W)
liberty_lat, liberty_lon = np.array([0.7101605100], dtype=np.float64), np.array([-1.2923203180], dtype=np.float64)
big_ben_lat, big_ben_lon = np.array([0.8988567821], dtype=np.float64), np.array([-0.0021746802], dtype=np.float64)

vincenty = nk.vincenty(liberty_lat, liberty_lon, big_ben_lat, big_ben_lon)    # ≈ 5,589,857 m (ellipsoidal, baseline)
haversine = nk.haversine(liberty_lat, liberty_lon, big_ben_lat, big_ben_lon)  # ≈ 5,543,723 m (spherical, ~46 km less)

# Vincenty in f32 — drifts ~2 m from f64
liberty_lat32 = liberty_lat.astype(np.float32)
liberty_lon32 = liberty_lon.astype(np.float32)
big_ben_lat32 = big_ben_lat.astype(np.float32)
big_ben_lon32 = big_ben_lon.astype(np.float32)
vincenty_f32 = nk.vincenty(liberty_lat32, liberty_lon32, big_ben_lat32, big_ben_lon32)  # ≈ 5,589,859 m (+2 m drift)

Curved Metrics

Curved-space kernels use an extra metric tensor or inverse covariance and should not be mixed into the Euclidean section.

import numpy as np
import numkong as nk

# Complex bilinear form: aᴴ M b
a = (np.ones(16) + 1j * np.zeros(16)).astype(np.complex64)
b = (np.zeros(16) + 1j * np.ones(16)).astype(np.complex64)
m = np.eye(16, dtype=np.complex64)
bilinear = nk.bilinear(a, b, m)

# Real Mahalanobis distance: √((a−b)ᵀ M⁻¹ (a−b))
x = np.ones(32, dtype=np.float32)
y = np.full(32, 2.0, dtype=np.float32)
inv_cov = np.eye(32, dtype=np.float32)
mahalanobis = nk.mahalanobis(x, y, inv_cov)

Scalar Types and Low-Precision Formats

NumKong exposes two different low-precision stories in Python. It exposes Python scalar objects for a few formats. And it exposes tensor dtypes for the broader buffer-oriented path.

The six scalar types have stable payload sizes even though Python object headers are not:

Type Bits Bytes Range Inf NaN
nk.float16 1+5+10 2 ±65504 yes yes
nk.bfloat16 1+8+7 2 ±3.4×10³⁸ yes yes
nk.float8_e4m3 1+4+3 1 ±448 no yes
nk.float8_e5m2 1+5+2 1 ±57344 yes yes
nk.float6_e2m3 1+2+3 1 ±7.5 no no
nk.float6_e3m2 1+3+2 1 ±28 no no

The Bits column shows sign + exponent + mantissa bit counts. The Bytes column is the stable payload size; float8_* and float6_* both store 1 byte because the sub-byte formats are padded to byte alignment.

The full object footprint is interpreter-dependent. Use sys.getsizeof(nk.float16(1.0)) if you need the heap footprint of the Python wrapper object itself. Use Tensor.itemsize and Tensor.nbytes for the stable payload sizes of array storage.

ml_dtypes matters here because NumKong explicitly interoperates with the formats that NumPy still does not model well. The test suite compares bfloat16, float8_e4m3, float8_e5m2, float6_e2m3, and float6_e3m2 behavior against ml_dtypes where that comparison is meaningful.

Promotion is intentional. Mixed exotic floats are routed through wider compute types rather than pretending a same-width accumulator is good enough.

ml_dtypes Interoperability

NumKong accepts ml_dtypes arrays directly — no .view(np.uint8) workaround needed:

import ml_dtypes
a = np.random.randn(100, 768).astype(np.float32).astype(ml_dtypes.bfloat16)
b = np.random.randn(100, 768).astype(np.float32).astype(ml_dtypes.bfloat16)
result = nk.cdist(a, b, "dot")  # just works

NumKong scalars also work as NumPy dtype specifiers:

arr = np.array([1.0, 2.0, 3.0], dtype=nk.bfloat16)
float(arr[0])  # → 1.0

Type name mapping between the two libraries:

ml_dtypes NumKong Status
ml_dtypes.bfloat16 nk.bfloat16 / "bfloat16" Identical format
ml_dtypes.float8_e4m3 nk.float8_e4m3 / "e4m3" Identical (IEEE E4M3)
ml_dtypes.float8_e4m3fn nk.float8_e4m3 / "e4m3" Identical (E4M3FN = no inf)
ml_dtypes.float8_e5m2 nk.float8_e5m2 / "e5m2" Identical format
ml_dtypes.float6_e2m3fn nk.float6_e2m3 / "e2m3" Identical (MX E2M3)
ml_dtypes.float6_e3m2fn nk.float6_e3m2 / "e3m2" Identical (MX E3M2)
ml_dtypes.float8_e4m3fnuz Rejected: different bias, NaN, and zero
ml_dtypes.float8_e5m2fnuz Rejected: different NaN and zero encoding
ml_dtypes.float8_e4m3b11fnuz Rejected: bias=11, incompatible encoding
ml_dtypes.float8_e8m0fnu Not supported: exponent-only MX scale format
ml_dtypes.float8_e3m4 Not supported: no NumKong kernel
ml_dtypes.float4_e2m1fn Not supported: 4-bit MX float
ml_dtypes.int4 "int4" Compatible via buffer protocol
ml_dtypes.uint4 "uint4" Compatible via buffer protocol
ml_dtypes.int2 Not supported
ml_dtypes.uint2 Not supported

Tensor Objects and Buffer Interop

Tensor is a memoryview-backed object with NumPy-like metadata. It is the central container for strided views, transpose, reshape, flatten, and axis reductions.

import numpy as np
import numkong as nk

t = nk.Tensor(np.arange(12, dtype=np.float32).reshape(3, 4))

print(t.shape, t.dtype, t.ndim, t.strides, t.itemsize, t.nbytes)
print(np.asarray(t))      # zero-copy array view when layout allows it
print(t.T.shape)          # transposed Tensor view
print(t.reshape(2, 6).shape)
print(t.flatten().shape)

# Slicing — row, column, and scalar access
row0 = t[0, :]            # first row, shape (4,)
col2 = t[:, 2]            # third column, strided view, shape (3,)
val  = t[1, 2]            # scalar element access → 6.0

# Reductions compose with sliced views
idx = col2.argmin()        # index of the minimum in the third column
mn, i0, mx, i1 = col2.minmax()

The important layout rules are:

  • Tensor preserves shape and byte strides.
  • Transpose and slicing can produce non-contiguous views.
  • General reductions accept those views.
  • Matrix-style packed kernels require row-contiguous left operands.
  • Packed and symmetric outputs require C-contiguous out buffers.

Memory Layout Requirements

API family Input requirement Output requirement
Dense distances (dot, euclidean, etc.) Rows must be contiguous (strides[last] <= itemsize). Strided rows (sliced columns) are rejected. out= can have any stride along dim 0, but inner dim must be contiguous.
cdist Same as dense distances out= must be rank-2 with shape (a.count, b.count)
Elementwise (scale, blend, fma) Arbitrary strides (strided views are supported) out= must match input shape; strides are preserved
Packed matrix (dots_packed) Left operand: rank-2, contiguous rows, no negative strides Output: C-contiguous with expected dtype
Symmetric (dots_symmetric) Contiguous rows out=: C-contiguous square matrix
Tensor reductions (sum, min, argmin, etc.) Arbitrary strides (strided views supported) N/A (returns scalar or reduced tensor)

All-Pairs APIs and cdist

cdist is the NumPy/SciPy-shaped all-pairs entrypoint. It handles rectangular matrix pairs and symmetric self-distance cases.

import numpy as np
import numkong as nk

queries = np.random.randn(100, 768).astype(np.float32)
database = np.random.randn(10_000, 768).astype(np.float32)

pairwise = nk.angular(queries, database[:100])             # rectangular broadcasted pairwise call
all_pairs = nk.cdist(queries, database, metric="angular")  # scipy.spatial.distance.cdist analogue

assert np.asarray(pairwise).shape == (100, 100)
assert np.asarray(all_pairs).shape == (100, 10_000)

The intended large-scale parallel model for packed and symmetric kernels is external partitioning with row ranges, not a hidden threads= argument.

Elementwise Operations

Elementwise arithmetic and fused operations are their own family. They share the tensor infrastructure but should not be collapsed into the reduction or matrix sections.

import numpy as np
import numkong as nk

a = np.arange(8, dtype=np.float32)
b = np.arange(8, dtype=np.float32)[::-1].copy()

scaled = nk.scale(a, alpha=2.0, beta=1.0)     # 2 * a + 1
blended = nk.blend(a, b, alpha=0.25, beta=0.75)
fused = nk.fma(a, b, a, alpha=1.0, beta=1.0)  # a * b + a

assert np.asarray(scaled).shape == (8,)
assert np.asarray(fused).shape == (8,)

Moments Reductions

Moments reductions return (sum, sum_of_squares). The key property is that NumKong does not force you into same-storage accumulation.

import numpy as np
import numkong as nk

x = np.full(4096, 255, dtype=np.uint8)

nk_sum, nk_sumsq = nk.moments(nk.Tensor(x))
naive_sum = np.sum(x, dtype=np.uint8)      # overflows immediately
naive_sumsq = np.sum(x * x, dtype=np.uint8) # also overflows

print(nk_sum, nk_sumsq, naive_sum, naive_sumsq)
assert nk_sum > int(naive_sum)
assert nk_sumsq > int(naive_sumsq)

Same-width accumulation is a bad default for low-precision storage.

Min/Max Reductions

Min/max reductions are in a separate section because they cover strided reduction cases. NumKong provides SIMD-accelerated strided reductions that are not common in other libraries.

import numpy as np
import numkong as nk

matrix = nk.Tensor(np.array([
    [ 3.0,  0.0, 7.0],
    [ 1.0,  2.0, 5.0],
    [ 4.0, -1.0, 6.0],
], dtype=np.float32))

second_column = matrix[:, 1]  # strided view into a row-major Nx3 tensor

idx = second_column.argmin()
mn, i0, mx, i1 = second_column.minmax()

assert idx == 2
assert int(i0) == 2
assert float(np.asarray(mn)) == -1.0

Fresh measurement for the rewritten docs: on an Apple M2 Pro, np.argmin(matrix[:, 1]) on a row-major 2,000,000 x 3 float32 array took about 1.63 ms median. The equivalent NumKong Tensor(... )[:, 1].argmin() took about 0.67 ms median. That is about 2.45x faster on this strided reduction case.

Sparse Operations and Intersections

Sparse helpers cover both sorted-index intersections and weighted sparse dot products.

import numpy as np
import numkong as nk

idx_a, idx_b = np.array([1, 3, 5, 7], dtype=np.uint32), np.array([3, 4, 5, 8], dtype=np.uint32)
intersection_size = nk.intersect(idx_a, idx_b) # len(np.intersect1d(idx_a, idx_b))
assert intersection_size == 2, "indices 3 and 5"

val_a, val_b = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32), np.array([5.0, 6.0, 7.0, 8.0], dtype=np.float32)
sparse_dot = nk.sparse_dot(idx_a, val_a, idx_b, val_b)
assert sparse_dot > 0, "weighted dot over shared indices"

Packed Matrix Kernels for GEMM-Like Workloads

Packed matrix kernels are the right tool when the right-hand side is reused across many query batches. This is the GEMM-like story.

import numpy as np
import numkong as nk

left = np.random.randn(128, 768).astype(np.float32)
right = np.random.randn(10_000, 768).astype(np.float32)

right_packed = nk.dots_pack(right, dtype="float32")  # pack once, reuse many times
scores = nk.dots_packed(left, right_packed)          # equivalent to left @ right.T

assert scores.shape == (128, 10_000)
assert right_packed.nbytes == nk.PackedMatrix.packed_size(10_000, 768, dtype="float32")

Important runtime rules from the current implementation:

  • a must be rank-2
  • a must have contiguous rows
  • negative strides are rejected for these matrix kernels
  • out, when provided, must be C-contiguous with the expected dtype
  • start_row and end_row split the left operand rows

The arithmetic advantages are:

  • one-time packing of B
  • one-time internal layout conversion and depth padding
  • norm reuse for angulars_packed and euclideans_packed
  • no repeated scan of the original right-hand-side layout

Packing itself does not require aligned caller buffers. The packed object owns its internal payload and handles the layout under the hood.

Tensor @ PackedMatrix is also supported and maps to the same packed dot-product path.

Symmetric Kernels for SYRK-Like Workloads

Symmetric kernels solve a different problem from packed cross-matrix kernels. They compute self-similarity or self-distance matrices. This is the SYRK-like story.

import numpy as np
import numkong as nk

vectors = np.random.randn(1024, 768).astype(np.float32)
out = nk.zeros((1024, 1024), dtype="float64")

nk.dots_symmetric(vectors, out=out, start_row=0, end_row=256)
nk.dots_symmetric(vectors, out=out, start_row=256, end_row=512)

assert out.shape == (1024, 1024)

This family has different economics from packed GEMM-like work. It avoids duplicate (i, j) and (j, i) evaluations. It is naturally partitioned by row windows of one square output.

angulars_symmetric and euclideans_symmetric also benefit from reuse of dot-product-derived work inside the symmetric sweep. That is why these APIs are faster than a nested Python loop over angular(a[i], a[j]).

Geometric Mesh Alignment

Mesh alignment returns a structured result object. The current implementation exposes rotation, scale, rmsd, a_centroid, and b_centroid.

import numpy as np
import numkong as nk

source = np.array(
    [[0.0, 0.0, 0.0],
     [1.0, 0.0, 0.0],
     [0.0, 1.0, 0.0]],
    dtype=np.float32,
)

result = nk.kabsch(source, source.copy())
assert np.asarray(result.rotation).shape == (3, 3)
assert float(np.asarray(result.scale)) == 1.0

# Umeyama with known 2x scaling
target = source * 2.0
result = nk.umeyama(source, target)
assert float(np.asarray(result.rmsd)) < 1e-6, "umeyama should recover exact alignment"
assert abs(float(np.asarray(result.scale)) - 2.0) < 0.01, "umeyama should recover 2x scale"

That field-level check is the right style for this API family. It tells the reader exactly what the result object owns.

MaxSim and ColBERT-Style Late Interaction

MaxSim is the late-interaction primitive used by systems such as ColBERT. It is not generic matrix multiplication.

import numpy as np
import numkong as nk

queries = np.random.randn(32, 128).astype(np.float32)
documents = np.random.randn(192, 128).astype(np.float32)

q = nk.maxsim_pack(queries, dtype="float32")
d = nk.maxsim_pack(documents, dtype="float32")
score = nk.maxsim_packed(q, d)

assert np.isfinite(score)
assert q.nbytes == nk.MaxSimPackedMatrix.packed_size(32, 128, dtype="float32")

Capabilities, GIL Behavior, and Parallel Partitioning

Capability detection is explicit:

import numkong as nk

caps = nk.get_capabilities()
print({k: v for k, v in caps.items() if v})

The current implementation releases the GIL around the native dense metric calls and around the packed and symmetric matrix kernels. The repository also has threading tests for packed and symmetric row-range partitioning.

GEMM-like packed work and SYRK-like symmetric work should be documented differently:

import concurrent.futures
import numpy as np
import numkong as nk

left = np.random.randn(4096, 768).astype(np.float32)
right = np.random.randn(8192, 768).astype(np.float32)
packed = nk.dots_pack(right, dtype="float32")
out = nk.zeros((4096, 8192), dtype="float64")  # out must be pre-allocated with correct shape and dtype

def packed_chunk(start, end):
    nk.dots_packed(left, packed, out=out, start_row=start, end_row=end) # split left rows against one shared packed RHS

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as pool:
    for start in range(0, 4096, 1024):
        pool.submit(packed_chunk, start, min(start + 1024, 4096))
import concurrent.futures
import numpy as np
import numkong as nk

vectors = np.random.randn(4096, 768).astype(np.float32)
out = nk.zeros((4096, 4096), dtype="float64")  # out must be pre-allocated with correct shape and dtype

def symmetric_chunk(start, end):
    nk.dots_symmetric(vectors, out=out, start_row=start, end_row=end) # split row windows of one square output

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as pool:
    for start in range(0, 4096, 1024):
        pool.submit(symmetric_chunk, start, min(start + 1024, 4096))

OpenMP and other native schedulers still matter in lower layers. For Python, the intended user-facing story is external partitioning around the GIL-free kernels you actually use.

Addressing External Memory

NumKong implements the Python buffer protocol for zero-copy interop with NumPy, PyTorch, and other buffer-aware libraries. Two additional primitives cover pointer-level workflows: data_ptr reads the integer address out of any Tensor, and from_pointer() wraps any integer address back into one.

data_ptr returns the raw address, suitable for passing into ctypes, CUDA, or any FFI boundary. from_pointer(address, shape, dtype, *, strides=None, owner=None) creates a non-owning Tensor view. The optional owner keeps the source object alive for the lifetime of the view.

import numpy as np
import numkong as nk

# Round-trip through an integer address
matrix = nk.zeros((3, 4), dtype='float32')
address = matrix.data_ptr
matrix_view = nk.from_pointer(address, (3, 4), 'float32', owner=matrix)

# Wrap a NumPy array with zero copies
embeddings = np.random.randn(1024).astype(np.float32)
embeddings_view = nk.from_pointer(embeddings.ctypes.data, (1024,), 'float32', owner=embeddings)
nk.dot(embeddings, embeddings_view)  # same underlying data

PyTorch tensors already implement the buffer protocol, so most functions accept them directly. For explicit pointer-level control, or to go the other direction, the same primitives apply:

import torch

query = torch.randn(512)
nk.dot(query, query)  # buffer protocol, zero copy

# Explicit pointer wrap
query_view = nk.from_pointer(query.data_ptr(), tuple(query.shape), 'float32', owner=query)

# NumKong → PyTorch: 1D via buffer protocol, N-D via numpy bridge
flat = torch.frombuffer(memoryview(nk_tensor), dtype=torch.float32)
shaped = torch.as_tensor(np.asarray(nk_tensor))

CUDA unified memory, pinned buffers, and mmap'd files all work the same way — any CPU-accessible pointer is valid.

import ctypes, mmap

# CUDA unified memory (ensure CPU accessibility first)
cudart = ctypes.CDLL("libcudart.so")
unified_ptr = ctypes.c_void_p()
cudart.cudaMallocManaged(ctypes.byref(unified_ptr), 4096, 1)
cudart.cudaDeviceSynchronize()
unified = nk.from_pointer(unified_ptr.value, (1024,), 'float32')

# Memory-mapped file
with open("data.bin", "r+b") as f:
    mapping = mmap.mmap(f.fileno(), 0)
    mapped = nk.from_pointer(ctypes.addressof(
        ctypes.c_char.from_buffer(mapping)),
        (1024,), 'float32', owner=mapping)

NumKong: Mixed Precision for All

Portable mixed-precision math, linear-algebra, & retrieval library with 2'000+ SIMD kernels for x86, Arm, RISC-V, LoongArch, Power, & WebAssembly, leveraging rare algebraic transforms with both 1D & 2D registers like AMX & SME, covering 15+ numeric types from 4-bit integers & 6-bit floats to 128-bit complex numbers, validated against 118-bit extended-precision baselines with saturation, casting, & rounding edge-case coverage, in a 5-100x smaller binary than other BLAS-like alternatives, co-designed with Tensor abstractions in C++, Python, Rust, JavaScript, GoLang, & Swift.

NumKong banner

Latency, Throughput, & Numerical Stability

Most libraries return dot products in the same type as the input — Float16 × Float16 → Float16, Int8 × Int8 → Int8. This leads to quiet overflow: a 2048-dimensional i8 dot product can reach ±10 million, but i8 maxes out at 127. NumKong promotes to wider accumulators — Float16 → Float32, BFloat16 → Float32, Int8 → Int32, Float32 → Float64 — so results stay in range.

Single 2048-d dot product on Intel Sapphire Rapids, single-threaded. Each cell shows gso/s, mean relative error vs higher-precision reference. gso/s = Giga Scalar Operations per Second — a more suitable name than GFLOP/s when counting both integer and floating-point work. NumPy 2.4, PyTorch 2.10, JAX 0.9.

Input NumPy + OpenBLAS PyTorch + MKL JAX NumKong
░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░
f64 2.0 gso/s, 1e-15 err 0.6 gso/s, 1e-15 err 0.4 gso/s, 1e-14 err 5.8 gso/s, 1e-16 err
f32 1.5 gso/s, 2e-6 err 0.6 gso/s, 2e-6 err 0.4 gso/s, 5e-6 err 7.1 gso/s, 2e-7 err
bf16 0.5 gso/s, 1.9% err 0.5 gso/s, 1.9% err 9.7 gso/s, 1.8% err
f16 0.2 gso/s, 0.25% err 0.5 gso/s, 0.25% err 0.4 gso/s, 0.25% err 11.5 gso/s, 0.24% err
e5m2 0.7 gso/s, 4.6% err 0.5 gso/s, 4.6% err 7.1 gso/s, 0% err
i8 1.1 gso/s, overflow 0.5 gso/s, overflow 0.5 gso/s, overflow 14.8 gso/s, 0% err

A fair objection: PyTorch and JAX are designed for throughput, not single-call latency. They lower execution graphs through XLA or vendored BLAS libraries like Intel MKL and Nvidia cuBLAS. So here's the same comparison on a throughput-oriented workload — matrix multiplication:

Matrix multiplication (2048 × 2048) × (2048 × 2048) on Intel Sapphire Rapids, single-threaded. gso/s = Giga Scalar Operations per Second, same format. NumPy 2.4, PyTorch 2.10, JAX 0.9, same versions.

Input NumPy + OpenBLAS PyTorch + MKL JAX NumKong
░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░
f64 65.5 gso/s, 1e-15 err 68.2 gso/s, 1e-15 err ~14.3 gso/s, 1e-15 err 8.6 gso/s, 1e-16 err
f32 140 gso/s, 9e-7 err 145 gso/s, 1e-6 err ~60.5 gso/s, 1e-6 err 37.7 gso/s, 4e-7 err
bf16 851 gso/s, 1.8% err ~25.8 gso/s, 3.4% err 458 gso/s, 3.6% err
f16 0.3 gso/s, 0.25% err 140 gso/s, 0.37% err ~26.1 gso/s, 0.35% err 103 gso/s, 0.26% err
e5m2 0.4 gso/s, 4.6% err ~26.4 gso/s, 4.6% err 398 gso/s, 0% err
i8 0.4 gso/s, overflow 50.0 gso/s, overflow ~0.0 gso/s, overflow 1279 gso/s, 0% err

For f64, compensated "Dot2" summation reduces error by 10–50× compared to naive Float64 accumulation, depending on vector length. For f32, widening to Float64 gives 5–10× lower error. The library ships as a relatively small binary:

Package Size Parallelism & Memory Available For
PyTorch + MKL 705 MB Vector & Tile SIMD, OpenMP Threads, Hidden Allocs Python, C++, Java
JAX + jaxlib 357 MB Vector SIMD, XLA Threads, Hidden Allocs Python
NumPy + OpenBLAS 30 MB Vector SIMD, Built-in Threads, Hidden Allocs Python
mathjs 9 MB No SIMD, No Threads, Many Allocs JS
NumKong 5 MB Vector & Tile SIMD, Your Threads, Your Allocs 7 languages

Every kernel is validated against 118-bit extended-precision baselines with per-type ULP budgets across log-normal, uniform, and Cauchy input distributions. Tests check triangle inequality, Cauchy-Schwarz bounds, NaN propagation, overflow detection, and probability-simplex constraints for each ISA variant. Results are cross-validated against OpenBLAS, Intel MKL, and Apple Accelerate. A broader throughput comparison is maintained in NumWars.

Quick Start

Language Install Compatible with Guide
C / C++ CMake, headers, & prebuilt Linux, macOS, Windows, Android include/README.md
Python pip install Linux, macOS, Windows python/README.md
Rust cargo add Linux, macOS, Windows rust/README.md
JS npm install & import Node.js, Bun, Deno & browsers javascript/README.md
Swift Swift Package Manager macOS, iOS, tvOS, watchOS swift/README.md
Go go get Linux, macOS, Windows via cGo golang/README.md

What's Inside

NumKong covers 17 numeric types — from 6-bit floats to 64-bit complex numbers — across dozens of operations and 30+ SIMD backends, with hardware-aware defaults: Arm prioritizes f16, x86 prioritizes bf16.

Operations × Backend

Backend dot dots spatial spatials set sets cast reduce trig maxsim mesh
x86
Haswell
Skylake · · ·
Ice Lake · ·
Genoa · · · · ·
Sapphire · · · · · · · ·
Sapphire AMX · · · · · · · ·
Diamond · · · · · · ·
Alder Lake · · · · ·
Sierra Forest · · · · · ·
Turin · · · · · · · · · · ·
Arm
NEON ·
NEON Half · · · · ·
NEON FHM · · · · · · ·
NEON BF16 · · · · ·
NEON SDot · · · · ·
NEON FP8 · · · · · · ·
SVE · · · · · · · ·
SVE Half · · · · · · · · ·
SVE BF16 · · · · · · · · ·
SVE SDot · · · · · · · · · · ·
SVE2 · · · · · · · · · · ·
SME · · · · · · · ·
SME F64 · · · · · · · · ·
SME BI32 · · · · · · · · ·
Other
Power VSX · · · · ·
LoongArch LASX · · · · ·
RVV · ·
RVV Half · · · · · · · · ·
RVV BF16 · · · · · · · · ·
RVV BB · · · · · · · · ·
WASM V128 ·
Serial

Numeric Types × Backend

Backend f64 f32 bf16 f16 e5m2 e4m3 e3m2 e2m3 i8 u8 i4 u4 u1 f64c f32c bf16c f16c
x86
Haswell
Skylake · ·
Ice Lake · · · · · · · ·
Genoa · · · · · · · · · · · · ·
Sapphire · · · · · · · · · · ·
Sapphire AMX · · · · · · · ·
Diamond · · · · · · · · · · · · · ·
Alder Lake · · · · · · · · · ·
Sierra Forest · · · · · · · · · · · · ·
Turin · · · · · · · · · · · · · · ·
Arm
NEON · · · · · ·
NEON Half · · · · · · · · · · · · ·
NEON FHM · · · · · · · · · · · · ·
NEON BF16 · · · · · · · · · · · · ·
NEON SDot · · · · · · · ·
NEON FP8 · · · · · · · · · · · · ·
SVE · · · · · · · ·
SVE Half · · · · · · · · · · · · · · ·
SVE BF16 · · · · · · · · · · · · · · · ·
SVE SDot · · · · · · · · · · · · · · · · ·
SVE2 · · · · · · · · · · · · · · ·
SME · · · · · ·
SME F64 · · · · · · · · · · · · ·
SME BI32 · · · · · · · · · · · · · · ·
Other
Power VSX · · · · · · · · · ·
LoongArch LASX · · · · · · · · · ·
RVV · ·
RVV Half · · · · · · · · · · · · · ·
RVV BF16 · · · · · · · · · · · · · ·
RVV BB · · · · · · · · · · · · · · · ·
WASM V128 · · · ·

Language Bindings

Operation C/C++ Python Rust JavaScript Swift Go
dot
dots
spatial
spatials
set
sets ·
cast · ·
reduce · · ·
trig · · ·
geospatial ·
maxsim ·
mesh · · ·

Not every combination is implemented — only the ones that unlock real performance gains. The icelake level doesn't get a dot_bf16 variant, for example, and falls through to dot_bf16_skylake. Every operation has a serial fallback, but even types no CPU supports today get optimized via lookup tables and bit-twiddling hacks rather than scalar loops. For details on compile-time and run-time dispatch, see the contributor guide.

Design Decisions

  • Avoid loop unrolling and scalar tails.
  • Don't manage threads and be compatible with any parallelism models.
  • Don't manage memory and be compatible with arbitrary allocators & alignment.
  • Don't constrain ourselves to traditional BLAS-like Matrix Multiplication APIs.
  • Don't throw exceptions and pass values by pointers.
  • Prefer saturated arithmetic and avoid overflows, where needed.
  • Cover most modern CPUs with flexible dispatch and wait for them to converge with GPUs.

The rest of this document unpacks the functionality and the logic behind the design decisions.

Auto-Vectorization & Loop Unrolling

Most "optimized SIMD code" is a 2–4x unrolled data-parallel for-loop over f32 arrays with a serial scalar tail for the last few elements:

float boring_dot_product_f32(float const *a, float const *b, size_t n) {
    __m256 sum0 = _mm256_setzero_ps(), sum1 = _mm256_setzero_ps();
    size_t i = 0;
    for (; i + 16 <= n; i += 16) {
        sum0 = _mm256_fmadd_ps(_mm256_loadu_ps(a + i), _mm256_loadu_ps(b + i), sum0);
        sum1 = _mm256_fmadd_ps(_mm256_loadu_ps(a + i + 8), _mm256_loadu_ps(b + i + 8), sum1);
    }
    float result = _mm256_reduce_add_ps(_mm256_add_ps(sum0, sum1));
    for (; i < n; i++) result += a[i] * b[i]; // serial tail
    return result;
}

This kind of unrolling has been a common request for NumKong, but the library avoids it by design.

Modern CPUs already "unroll" in hardware. Out-of-order engines with reorder buffers of 320–630 entries (Zen 4: 320, Golden Cove: 512, Apple Firestorm: ~630) can keep a dozen of loop iterations in-flight simultaneously. The physical register file is much larger than the ISA-visible architectural registers — Skylake has ~180 physical integer registers behind 16 architectural GPRs, and ~168 physical vector registers behind 32 architectural ZMMs. The register renaming unit maps the same zmm0 in iteration N and iteration N+1 to different physical registers, extracting cross-iteration parallelism automatically — exactly the benefit that source-level unrolling was historically supposed to provide.

Unrolling works against NumKong's goals. Every unrolled copy is a distinct instruction in the binary. With 1,500+ kernel endpoints across 30+ backends, even 2x unrolling would inflate the .text section by megabytes — directly impacting install size for Python wheels, NPM packages, and Rust crates. Larger loop bodies also increase instruction-cache and micro-op-cache pressure; Agner Fog also recommends:

"avoid loop unrolling where possible in order to economize the use of the micro-op cache".

A loop that spills out of the uop cache falls back to the slower legacy decoder, making the "optimized" version slower than the compact original. For a header-only library, unrolling also compounds compilation time: register allocation is NP-hard (reducible to graph coloring), and unrolling multiplies the number of simultaneously live ranges the allocator must consider, increasing compile time super-linearly across every translation unit that includes the headers.

Serial tails are a correctness hazard. The leftover elements after the last full SIMD chunk run through a scalar loop that silently drops FMA fusion, compensated accumulation, and saturating arithmetic — producing results with different numerical properties than the SIMD body. NumKong often uses masked loads instead (_mm512_maskz_loadu_ps on AVX-512, predicated svld1_f32 on SVE), processing every element through the same arithmetic path regardless of alignment. It's not exactly orthogonal to loop-unrolling, but makes a different kernel layout more compatible.

The gains come from elsewhere. On Intel Sapphire Rapids, NumKong was benchmarked against auto-vectorized code compiled with GCC 12. GCC handles single-precision float well, but struggles with _Float16 and other mixed-precision paths:

Kind GCC 12 f32 GCC 12 f16 NumKong f16 f16 improvement
Inner Product 3,810 K/s 192 K/s 5,990 K/s 31 x
Cosine Distance 3,280 K/s 336 K/s 6,880 K/s 20 x
Euclidean Distance ² 4,620 K/s 147 K/s 5,320 K/s 36 x
Jensen-Shannon Divergence 1,180 K/s 18 K/s 2,140 K/s 118 x

NumKong's f16 kernels are faster than GCC's f32 output — not because of unrolling, but because they use F16C conversion instructions, widening FMA pipelines, and compensated accumulation that compilers do not synthesize from a plain for loop. The same story repeats for bf16, e4m3, i8, and i4: these types require algorithmic transformations — lookup tables, algebraic domain shifts, asymmetric VNNI tricks — that live beyond the reach of auto-vectorization.

Parallelism & Multi-Threading

BLAS libraries traditionally manage their own thread pools. OpenBLAS spawns threads controlled by OPENBLAS_NUM_THREADS, Intel MKL forks its own OpenMP runtime via MKL_NUM_THREADS, and Apple Accelerate delegates to GCD (Grand Central Dispatch). This works in isolation — but the moment your application adds its own parallelism (joblib, std::thread, Tokio, GCD, OpenMP), you get thread oversubscription: MKL spawns 8 threads inside each of your 8 joblib workers, producing 64 threads on 8 cores, thrashing caches and stalling on context switches. The Python ecosystem has built entire libraries just to work around this problem, and scikit-learn's documentation devotes a full page to managing the interaction between joblib parallelism and BLAS thread pools.

NumKong takes a different position: the numerics layer should not own threads. Modern hardware makes the "spawn N threads and split evenly" model increasingly untenable:

  • Server-grade CPUs have hundreds of cores split across sockets, chiplets, and tiles, resulting in dozens of physical NUMA domains with vastly different memory access latencies. A thread pool that ignores NUMA topology will spend more time on remote memory stalls than on arithmetic.
  • Consumer-grade CPUs pack heterogeneous Quality-of-Service core types on the same die — Intel P-cores and E-cores run at different frequencies and sometimes support different ISA extensions. A naive work-split gives equal chunks to fast and slow cores, and the whole task stalls waiting for the slowest partition.
  • Real-time operating systems in robotics and edge AI cannot afford to yield the main thread to a BLAS-managed pool. These systems need deterministic latency, not maximum throughput.

Instead, NumKong exposes row-range parameters that let the caller partition work across any threading model. For GEMM-shaped dots_packed, this is straightforward — pass a slice of A's rows and the full packed B to compute the corresponding slice of C. For SYRK-shaped dots_symmetric, explicit start_row / end_row parameters control which rows of the symmetric output matrix a given thread computes. The GIL (Global Interpreter Lock) is released around every kernel call, making NumKong compatible with concurrent.futures, multiprocessing, or any other parallelism model:

import concurrent.futures, numkong as nk, numpy as np

vectors, num_threads = np.random.randn(1000, 768).astype(np.float32), 4
output = nk.zeros((1000, 1000), dtype="float32")

def compute_slice(t):
    start = t * (len(vectors) // num_threads)
    end = start + len(vectors) // num_threads if t < num_threads - 1 else len(vectors)
    nk.dots_symmetric(vectors, out=output, start_row=start, end_row=end)

with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as pool:
    list(pool.map(compute_slice, range(num_threads)))

For users who want a ready-made low-latency thread pool without the oversubscription baggage of OpenMP, we built ForkUnion — a minimalist fork-join library for C, C++, and Rust that avoids mutexes, CAS atomics, and dynamic allocations on the critical path, with optional NUMA pinning on Linux.

Memory Allocation & Management

BLAS libraries typically allocate internal buffers during GEMM — OpenBLAS packs matrices into L2/L3-sized panels via per-thread buffer pools backed by mmap or shmget. This hidden allocation has caused real problems: 14 lock/unlock pairs per small GEMM call throttling 12-thread scaling to 2x, silently incorrect results from thread-unsafe allocation in np.dot, and deadlocks after fork() due to mutex state not being reset in child processes. The BLASFEO library was created specifically for embedded model-predictive control where malloc during computation is unacceptable.

NumKong never allocates memory. Following the same philosophy as Intel MKL's packed GEMM API (cblas_sgemm_pack_get_sizecblas_sgemm_packcblas_sgemm_compute), NumKong exposes typed three-phase interfaces — nk_dots_packed_size_*nk_dots_pack_*nk_dots_packed_* — where the caller owns the buffer and NumKong only fills it.

The reason GEMM libraries repack matrices at all is that every hardware target has a different preferred layout — Intel AMX expects B in a VNNI-interleaved tile format (pairs of BFloat16 values packed into DWORDs across the K dimension), while Arm SME wants column vectors for its FMOPA outer-product instructions. Since GEMM is $O(N^3)$ and repacking is $O(N^2)$, the cost is asymptotically free — but the allocation and locking overhead is not.

NumKong's nk_dots_pack_* family performs five transformations beyond simple reordering:

  • Type pre-conversion — mini-floats (E4M3, BFloat16, etc.) are upcast to the compute type once during packing, not on every GEMM call. This amortizes the conversion cost across all rows of A that will be multiplied against the packed B.
  • SIMD depth padding — rows are zero-padded to the SIMD vector width (16 for AVX-512 Float32, 64 for AVX-512 Int8), allowing inner loops to load without boundary checks.
  • Per-column norm precomputation — squared norms ($|b_j|^2$) are computed and stored alongside the packed data, so distance kernels (angulars_packed, euclideans_packed) can reuse them without a separate pass.
  • ISA-specific tile layout — AMX packing interleaves BFloat16 pairs into 16×32 tiles matching TDPBF16PS expectations; SME packing arranges vectors at SVE granularity for FMOPA outer products; generic backends use simple column-major with depth padding.
  • Power-of-2 stride breaking — when the padded row stride is a power of 2, one extra SIMD step of padding is added. Power-of-2 strides cause cache set aliasing where consecutive rows map to the same cache sets, effectively shrinking usable L1/L2 capacity — stride-256 traversals can be ~10x slower than stride-257.
import numkong as nk, numpy as np

right_matrix = np.random.randn(1000, 768).astype(np.float16)
right_packed = nk.dots_pack(right_matrix, dtype=nk.float16)                        # pack once
for query_batch in stream: results = nk.dots_packed(query_batch, right_packed)    # reuse many times

Why Not Just GEMM? The Evolution of Matrix Multiplication APIs

The classic BLAS GEMM computes $C = \alpha A B + \beta C$ for Float32/Float64 matrices. It covers many use cases, but LLM inference, vector search, and quantum simulation expose three ways in which the traditional interface falls short.

Frozen weights justify separating packing from computation. During LLM inference, a very large share of GEMM calls use a static weight matrix — weights don't change after loading. This makes offline repacking a one-time cost amortized over the entire serving lifetime: NVIDIA's TurboMind explicitly splits GEMM into offline weight packing (hardware-aware layout conversion) and online mixed-precision computation, and Intel MKL's packed GEMM API exposes the same two-phase pattern. NumKong's nk_dots_pack_*nk_dots_packed_* path follows this philosophy — pack the weight matrix once, reuse it across all queries.

Mixed precision demands more than an epilogue addition. Modern transformer layers operate in a precision sandwich: weights stored in BFloat16/Float8, GEMM accumulated in Float32, output downcast back to BFloat16 for the next layer. Between GEMM calls, LayerNorm or RMSNorm re-normalizes hidden states, so the next layer is often much closer to an angular or normalized similarity computation than to a plain raw dot product. nGPT takes this to its logical conclusion: all vectors live on the unit hypersphere, and every matrix-vector product is a pure angular distance. This means many "GEMM" workloads in production are semantically closer to many-to-many angular distance computation — which is exactly what NumKong's angulars_packed and euclideans_packed kernels compute directly, fusing norm handling and type conversion into a single pass.

The GEMM-for-distances trick has real costs. A common shortcut in vector search is to decompose pairwise Euclidean distance as $|a - b|^2 = |a|^2 + |b|^2 - 2 \langle a, b \rangle$, precompute norms, and call sgemm for the inner-product matrix. Both FAISS and scikit-learn use this approach — and both document its limitations. Scikit-learn's docs warn of "catastrophic cancellation" in the subtraction; this has caused real bugs with ~37% error on near-identical Float32 vectors. The $O(N^2)$ postprocessing pass (adding norms, square roots, divisions) is not free either — NVIDIA's RAFT measured a 20–25% speedup from fusing it into the GEMM epilogue. Even FAISS switches to direct SIMD when the query count drops below 20. The standard BLAS interface was never designed for sub-byte types either — no vendor supports Int4, and sub-byte types cannot even be strided without bit-level repacking.

Some operations need more than GEMM + postprocessing. NumKong implements several GEMM-shaped operations where the "epilogue" is too complex for a simple addition:

  • Bilinear forms ($a^T C b$) in quantum computing compute a scalar expectation value — the naive approach materializes an $N$-dimensional intermediate vector $Cb$, but NumKong's typed nk_bilinear_* kernels stream through rows of $C$ with nested compensated dot products, never allocating beyond registers. For complex-valued quantum states, where the intermediate would be a 2N-element complex vector, the savings double.
  • MaxSim scoring for ColBERT-style late-interaction retrieval computes $\sum_i \min_j \text{angular}(q_i, d_j)$ — a sum-of-min-distances across token pairs. A GEMM would produce the full $M \times N$ similarity matrix, but NumKong's typed nk_maxsim_packed_* kernels fuse a coarse Int8-quantized screening with full-precision angular refinement on winning pairs only, packing both query and document matrices to use all 4 SME tiles as accumulators. PLAID and maxsim-cpu have independently shown that dedicated MaxSim kernels can outperform the GEMM decomposition by 5–10x.

NumKong treats these as first-class operations — dots_packed, euclideans_packed, angulars_packed, typed nk_bilinear_* kernels, and typed nk_maxsim_packed_* kernels — rather than decomposing everything into GEMM + postprocessing.

Precision by Design: Saturation, Rounding, & Float6 Over Float8

Floating-point arithmetic on computers is not associative: $(a + b) + c \neq a + (b + c)$ in general, and upcasting to wider types is not always sufficient. NumKong makes operation-specific decisions about where to spend precision and where to economize, rather than applying one rule uniformly.

Saturation depends on the operation. A reduction over a 4 GB array of i8 values contains ~4 billion elements — but Int32 wrapping overflow occurs after just ~17 million Int8 summands ($127 \times 16.9\text{M} > 2^{31}$). Reductions in NumKong use saturating arithmetic because the input can be arbitrarily long. Matrix multiplications don't need saturation because GEMM depth rarely exceeds tens of thousands — well within Int32 range. x86 provides no saturating 32-bit SIMD add (only byte/word variants), so NumKong implements saturation via overflow detection with XOR-based unsigned comparison on platforms that lack native support.

Square roots & special math ops are platform-specific. Angular distance requires $1/\sqrt{|a|^2 \cdot |b|^2}$ — but the cost of computing this normalization varies dramatically across hardware. x86 VSQRTPS takes ~12 cycles, followed by VDIVPS at ~11 cycles — totalling ~23 cycles for a precise 1/sqrt(x). The VRSQRT14PS alternative starts with a 14-bit estimate in ~4 cycles, then one Newton-Raphson iteration ($y = y \cdot (1.5 - 0.5 x y^2)$, ~4 more cycles) reaches full Float32 precision — roughly 3x faster. ARM's FRSQRTE provides only ~8 bits, requiring two Newton-Raphson iterations to match. NumKong selects the iteration count per platform so the final ULP bound is consistent across ISAs, rather than exposing different precision to different users.

E2M3 and E3M2 can outperform E4M3 and E5M2. 6-bit MX formats can be scaled to exact integers, enabling integer accumulation that avoids E5M2's catastrophic cancellation risk. This works because E2M3's narrower exponent range means every representable value maps to an integer after a fixed shift — no rounding, no cancellation. See Mini-Floats for a worked example.

Every such decision — saturation thresholds, Newton-Raphson iteration counts, integer vs floating-point paths — is documented per operation and per type in the module-specific READMEs.

Calling Convention & Error Handling

NumKong never throws exceptions, never sets errno, and never calls setjmp/longjmpexceptions bloat call sites with unwind tables and are invisible to C, Python, Rust, Swift, Go, and JavaScript FFI; errno is thread-local state whose storage model varies across C runtimes. Instead, every function takes inputs as const pointers, writes outputs through caller-provided pointers, and returns void:

void nk_dot_f32(nk_f32_t const *a, nk_f32_t const *b, nk_size_t n, nk_f64_t *result);
void nk_dot_bf16(nk_bf16_t const *a, nk_bf16_t const *b, nk_size_t n, nk_f32_t *result);

Pointers eliminate implicit casts for types with platform-dependent storage — this is why they matter for half-precision types. nk_f16_t and nk_bf16_t resolve to native __fp16 / __bf16 when available but fall back to unsigned short otherwise — if passed by value, the compiler would silently apply integer promotion instead of preserving the bit pattern. Passing by pointer keeps the representation opaque: kernels read raw and convert explicitly when needed, so the same binary works regardless of whether the compiler understands _Float16.

The only place that requires error signaling is dynamic dispatch — looking up the best kernel for the current CPU at runtime. When no kernel matches, the dispatcher sets the capabilities mask to zero and fills the function pointer with a family-specific error stub such as nk_error_dense_ from c/dispatch.h and c/numkong.c that writes 0xFF into the output — NaN for floats, −1 for signed integers, TYPE_MAX for unsigned.

Compile-Time and Run-Time Dispatch

NumKong provides two dispatch mechanisms. Compile-time dispatch selects the fastest kernel supported by the target platform at build time — thinner binaries, no indirection overhead, but requires knowing your deployment hardware. Run-time dispatch compiles every supported kernel into the binary and picks the best one on the target machine via nk_capabilities() — one pointer indirection per call, but a single binary runs everywhere. The run-time path is common in DBMS products (ClickHouse), web browsers (Chromium), and other upstream projects that ship to heterogeneous fleets.

All kernel names follow the pattern nk_{operation}_{type}_{backend}. If you need to resolve the best kernel manually, use nk_find_kernel_punned with a nk_kernel_kind_t, nk_dtype_t, and a viable capabilities mask:

nk_metric_dense_punned_t angular = 0;
nk_capability_t used = nk_cap_serial_k;
nk_find_kernel_punned(
    nk_kernel_angular_k, nk_f32_k,            // what functionality? for which input type?
    nk_capabilities(),                        // which capabilities are viable?
    (nk_kernel_punned_t *)&angular, &used);   // the kernel found and capabilties used!

The first call to nk_capabilities() initializes the dispatch table; all subsequent calls are lock-free.

Numeric Types

Float64 & Float32: IEEE Precision

Float64 — NumKong uses compensated summation that tracks numerical errors separately. On serial paths, we use Neumaier's algorithm (1974), an improvement over Kahan-Babuška that correctly handles cases where added terms are larger than the running sum, achieving $O(1)$ error growth instead of $O(n)$. On SIMD paths with FMA support, we implement the Dot2 algorithm (Ogita-Rump-Oishi, 2005), maintaining separate error compensators for both multiplication and accumulation via TwoProd and TwoSum operations. The accuracy differences are visible in the benchmark tables above — compensated Float64 suits scientific computing where numerical stability matters more than raw speed.

Float32 — SIMD implementations load Float32 values, upcast to Float64 for full-precision multiplication and accumulation, then downcast only during finalization. This avoids catastrophic cancellation at minimal cost since modern CPUs have dedicated Float64 vector units operating at nearly the same throughput as Float32. The same compensated accumulation strategy applies to Mahalanobis distance, bilinear forms, and KL/JS divergences.

// Dot2 TwoProd: Capture multiplication rounding error
h = a * b;
r = fma(a, b, -h);  // Extracts rounding error

// Dot2 TwoSum: Capture addition rounding error
t = sum + product;
e = (sum - t) + product;  // Compensator term

BFloat16 & Float16: Half Precision

BFloat16 — not an IEEE 754 standard type, but widely adopted for AI workloads. BFloat16 shares Float32's 8-bit exponent but truncates the mantissa to 7 bits, prioritizing dynamic range over precision (±3.4×10³⁸ with coarser granularity). On old CPUs, upcasting BFloat16 to Float32 requires just an unpack and left-shift by 16 bits (essentially free); on newer CPUs, both Arm and x86 provide widening mixed-precision dot products via DPBF16PS (AVX-512 on Genoa/Sapphire Rapids) and BFDOT (NEON on ARMv8.6-A Graviton 3+). NumKong's Float8 types (E4M3/E5M2) upcast to BFloat16 before using DPBF16PS, creating a three-tier precision hierarchy: Float8 for storage, BFloat16 for compute, Float32 for accumulation.

Float16 — IEEE 754 half-precision with 1 sign bit, 5 exponent bits (bias=15), and 10 mantissa bits, giving a range of ±65504. Float16 prioritizes precision over range (10 vs 7 mantissa bits), making it better suited for values near zero and gradients during training. On x86, older CPUs use F16C extensions (Ivy Bridge+) for fast Float16 → Float32 conversion; Sapphire Rapids+ adds native AVX-512-FP16 with dedicated Float16 arithmetic. On Arm, ARMv8.4-A adds FMLAL/FMLAL2 instructions for fused Float16 → Float32 widening multiply-accumulate, reducing the total latency from 7 cycles to 4 cycles and achieving 20–48% speedup over the separate convert-then-FMA path.

Platform BFloat16 Path Elem/Op Float16 Path Elem/Op
x86
Sapphire Rapids (2023) ↓ Genoa 32 ↓ Skylake 16
Genoa (2022) VDPBF16PS widening dot 32 ↓ Skylake 16
Skylake (2015) SLLI + VFMADD 16 VCVTPH2PS + VFMADD 16
Haswell (2013) SLLI + VFMADD 8 VCVTPH2PS + VFMADD 8
Arm
Graviton 3 (2021) SVBFDOT widening dot 4–32 SVCVTSVFMLA 4–32
Apple M2+ (2022) BFDOT widening dot 8 ↓ FP16FML 8
Apple M1 (2020) ↓ NEON 8 FMLAL widening FMA 8
Graviton 2 (2019) ↓ NEON 8 FCVTL + FMLA 4
Graviton 1 (2018) SHLL + FMLA 8 bit-manip → FMLA 8

BFloat16 shares Float32's 8-bit exponent, so upcasting is a 16-bit left shift (SLLI on x86, SHLL on Arm) that zero-pads the truncated mantissa — essentially free. Float16 has a different exponent width (5 vs 8 bits), requiring a dedicated convert: VCVTPH2PS (x86 F16C) or FCVTL (Arm NEON). Widening dot products (VDPBF16PS, BFDOT, FMLAL) fuse the conversion and multiply-accumulate into one instruction. Sapphire Rapids has native VFMADDPH for Float16 arithmetic, but NumKong does not use it for general dot products — Float16 accumulation loses precision. It is only used for mini-float (E2M3/E3M2) paths where periodic flush-to-Float32 windows keep error bounded.

Mini-Floats: E4M3, E5M2, E3M2, & E2M3

Format Bits Range NumKong Promotion Rules Support in GPUs
E5M2FN 8 ±57344 BFloat16 → Float32 H100+, MI300+
E4M3FN 8 ±448 BFloat16 → Float32 H100+, MI300+
E3M2FN 6 → 8 ±28 BFloat16 & Float16 → Float32,
Int16 → Int32
only block-scaled
E2M3FN 6 → 8 ±7.5 BFloat16 & Float16 → Float32,
Int8 → Int32
only block-scaled
Block-scaled NVFP4 4 ±6 B200+
Block-scaled MXFP4 / E2M1 4 ±6 B200+, MI325+

Block scaling. NumKong does not implement block-scaled variants (MXFP4, NVFP4, or block-scaled E3M2/E2M3). Block scaling couples elements through a shared exponent per block, introducing structural bias into a fundamentally uniform operation. NumKong treats each element independently; block-scaled inputs should be dequantized before processing.

FNUZ variants. AMD MI300 (CDNA 3) uses FNUZ encoding (negative-zero-is-NaN) rather than the OCP standard. MI350+ and NVIDIA H100/B200 both use OCP-standard E4M3FN/E5M2FN. NumKong follows the OCP convention; FNUZ inputs require conversion before processing.

8-bit floats (E4M3 & E5M2) follow the OCP FP8 standard. E4M3FN (no infinities, NaN only) is preferred for training where precision near zero matters; E5M2FN (with infinities) provides wider dynamic range for inference. On x86 Genoa/Sapphire Rapids, E4M3/E5M2 values upcast to BFloat16 via lookup tables, then use native DPBF16PS for 2-per-lane dot products accumulating to Float32. On Arm Graviton 3+, the same BFloat16 upcast happens via NEON table lookups, then BFDOT instructions complete the computation.

6-bit floats (E3M2 & E2M3) follow the OCP MX v1.0 standard. Their smaller range allows scaling to exact integers that fit in i8/i16, enabling integer VPDPBUSD/SDOT accumulation instead of the floating-point pipeline. Float16 can also serve as an accumulator, accurately representing ~50 products of E3M2FN pairs or ~20 products of E2M3FN pairs before overflow. On Arm, NEON FHM extensions bring widening FMLAL dot-products for Float16 — both faster and more widely available than BFDOT for BFloat16.

E4M3 and E5M2 cannot use the integer path. E4M3 scaled by 16 reaches 7,680 — too large for Int8, barely fitting Int16 with a 128-entry table. E5M2's range (±57,344) makes the scaled product exceed Int32 entirely. Without the integer path, E5M2 falls back to Float32 accumulation — where its 2-bit mantissa (only 4 values per binade) creates a catastrophic cancellation risk that E2M3's integer path avoids completely:

i = 0 i = 1 i = 2 i = 3 i = 4 i = 5 i = 6
aᵢ 0.00122 20480 −0.00122 1.5 −3072 −640 0.00146
bᵢ −40 320 −1280 −7.63e⁻⁵ 0.000427 10240 −4.58e⁻⁵
aᵢ·bᵢ −0.04883 6553600 1.5625 −0.000114 −1.3125 −6553600 ≈ 0

Why Float32 accumulation fails here. The accurate sum of these 7 products is ≈ 0.201. A vfmaq_f32 call accumulates 4 lanes at a time; the first batch already carries values around ±6.5 M. At that magnitude the Float32 ULP is 0.5 — so the small meaningful terms (−0.049, 1.563, −1.313, −0.0001) are all below one ULP and get absorbed during lane reduction. The large terms then cancel exactly to zero, and the information is gone. Final Float32 result: 0.0 instead of 0.201.

Int8 & Int4: Integer Types

Both signed and unsigned 8-bit and 4-bit integers are supported with Int32 accumulation to prevent overflow. A notable optimization is the VNNI algebraic transform: on Ice Lake+ with AVX-512 VNNI, the native DPBUSD instruction is asymmetric (unsigned × signed → signed), but NumKong uses it for both Int8×Int8 and UInt8×UInt8. For signed Int8×Int8, we convert the signed operand to unsigned via XOR with 0x80, compute DPBUSD(a⊕0x80, b) = (a+128)×b, then subtract a correction term 128×sum(b) to recover the true result. For unsigned UInt8×UInt8, we XOR the second operand to make it signed, compute DPBUSD(a, b⊕0x80) = a×(b-128), then add correction 128×sum(a) via the fast SAD instruction.

Int4 values pack two nibbles per byte, requiring bitmask extraction: low nibbles (byte & 0x0F) and high nibbles (byte >> 4). For signed Int4, the transformation (nibble ⊕ 8) - 8 maps the unsigned range [0,15] to signed range [−8,7]. Separate accumulators for low and high nibbles avoid expensive nibble-interleaving and allow SIMD lanes to work in parallel.

// Asymmetric transform for i8×i8 using DPBUSD (unsigned×signed)
a_unsigned = a XOR 0x80;           // Convert signed→unsigned
result = DPBUSD(a_unsigned, b);    // Computes (a+128)×b
correction = 128 * sum(b);         // Parallel on different port
final = result - correction;       // True a×b value

Binary: Packed Bits

The u1x8 type packs 8 binary values per byte, enabling Hamming distance and Jaccard similarity via population-count instructions. On x86, VPOPCNTDQ (Ice Lake+) counts set bits in 512-bit registers directly; on Arm, CNT (NEON) operates on 8-bit lanes with a horizontal add. Results accumulate into u32 — sufficient for vectors up to 4 billion bits. Binary representations are the most compact option for locality-sensitive hashing and binary neural network inference.

Complex Types

NumKong supports four complex types — f16c, bf16c, f32c, and f64c — stored as interleaved real/imaginary pairs. Complex types are essential in quantum simulation (state vectors, density matrices), signal processing (FFT coefficients, filter design), and electromagnetic modeling. The dot operation computes the unconjugated dot product $\sum a_k b_k$, while vdot computes the conjugated inner product $\sum \bar{a}_k b_k$ standard in physics and signal processing.

For complex dot products, NumKong defers sign flips until after the accumulation loop: instead of using separate FMA and FMS (fused multiply-subtract) instructions for the real component, we compute $a_r b_r + a_i b_i$ treating all products as positive, then apply a single bitwise XOR with 0x80000000 to flip the sign bits. This avoids execution port contention between FMA and FMS, letting dual FMA units stay occupied.

for (...) { // Complex multiply optimization: XOR sign flip after the loop
    sum_real = fma(a, b, sum_real);   // No sign flip in loop
    sum_imag = fma(a, b_swapped, sum_imag);
}
sum_real = xor(sum_real, 0x80000000);  // Single XOR after loop

Reading Materials

Beyond the READMEs in this repository, there are several standalone articles covering different evolution steps and features of this library.

License

Feel free to use the project under Apache 2.0 or the Three-clause BSD license at your preference.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numkong-7.3.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

numkong-7.3.0-cp314-cp314t-win_arm64.whl (499.2 kB view details)

Uploaded CPython 3.14tWindows ARM64

numkong-7.3.0-cp314-cp314t-win_amd64.whl (579.5 kB view details)

Uploaded CPython 3.14tWindows x86-64

numkong-7.3.0-cp314-cp314t-musllinux_1_2_x86_64.whl (10.8 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

numkong-7.3.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (11.0 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp314-cp314t-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

numkong-7.3.0-cp314-cp314t-macosx_10_15_x86_64.whl (755.0 kB view details)

Uploaded CPython 3.14tmacOS 10.15+ x86-64

numkong-7.3.0-cp314-cp314-win_arm64.whl (498.4 kB view details)

Uploaded CPython 3.14Windows ARM64

numkong-7.3.0-cp314-cp314-win_amd64.whl (576.3 kB view details)

Uploaded CPython 3.14Windows x86-64

numkong-7.3.0-cp314-cp314-musllinux_1_2_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

numkong-7.3.0-cp314-cp314-musllinux_1_2_s390x.whl (2.9 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ s390x

numkong-7.3.0-cp314-cp314-musllinux_1_2_ppc64le.whl (3.3 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ ppc64le

numkong-7.3.0-cp314-cp314-musllinux_1_2_i686.whl (2.9 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ i686

numkong-7.3.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl (3.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ s390xmanylinux: glibc 2.28+ s390x

numkong-7.3.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl (3.3 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ ppc64lemanylinux: glibc 2.28+ ppc64le

numkong-7.3.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl (2.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ i686manylinux: glibc 2.5+ i686

numkong-7.3.0-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

numkong-7.3.0-cp314-cp314-macosx_10_15_x86_64.whl (753.1 kB view details)

Uploaded CPython 3.14macOS 10.15+ x86-64

numkong-7.3.0-cp313-cp313t-win_arm64.whl (475.9 kB view details)

Uploaded CPython 3.13tWindows ARM64

numkong-7.3.0-cp313-cp313t-win_amd64.whl (559.9 kB view details)

Uploaded CPython 3.13tWindows x86-64

numkong-7.3.0-cp313-cp313t-musllinux_1_2_x86_64.whl (10.8 MB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ x86-64

numkong-7.3.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (11.0 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp313-cp313t-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13tmacOS 11.0+ ARM64

numkong-7.3.0-cp313-cp313t-macosx_10_13_x86_64.whl (754.9 kB view details)

Uploaded CPython 3.13tmacOS 10.13+ x86-64

numkong-7.3.0-cp313-cp313-win_arm64.whl (475.4 kB view details)

Uploaded CPython 3.13Windows ARM64

numkong-7.3.0-cp313-cp313-win_amd64.whl (557.4 kB view details)

Uploaded CPython 3.13Windows x86-64

numkong-7.3.0-cp313-cp313-musllinux_1_2_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

numkong-7.3.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

numkong-7.3.0-cp313-cp313-macosx_10_13_x86_64.whl (752.9 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

numkong-7.3.0-cp312-cp312-win_arm64.whl (475.3 kB view details)

Uploaded CPython 3.12Windows ARM64

numkong-7.3.0-cp312-cp312-win_amd64.whl (557.4 kB view details)

Uploaded CPython 3.12Windows x86-64

numkong-7.3.0-cp312-cp312-musllinux_1_2_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

numkong-7.3.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

numkong-7.3.0-cp312-cp312-macosx_10_13_x86_64.whl (752.9 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

numkong-7.3.0-cp311-cp311-win_arm64.whl (475.2 kB view details)

Uploaded CPython 3.11Windows ARM64

numkong-7.3.0-cp311-cp311-win_amd64.whl (556.9 kB view details)

Uploaded CPython 3.11Windows x86-64

numkong-7.3.0-cp311-cp311-musllinux_1_2_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

numkong-7.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (11.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

numkong-7.3.0-cp311-cp311-macosx_10_9_x86_64.whl (758.1 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

numkong-7.3.0-cp310-cp310-win_arm64.whl (475.3 kB view details)

Uploaded CPython 3.10Windows ARM64

numkong-7.3.0-cp310-cp310-win_amd64.whl (557.0 kB view details)

Uploaded CPython 3.10Windows x86-64

numkong-7.3.0-cp310-cp310-musllinux_1_2_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

numkong-7.3.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

numkong-7.3.0-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

numkong-7.3.0-cp310-cp310-macosx_10_9_x86_64.whl (758.2 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file numkong-7.3.0.tar.gz.

File metadata

  • Download URL: numkong-7.3.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0.tar.gz
Algorithm Hash digest
SHA256 c1c92bf190a5282d34fdff68be12f9ab283c830755ffeb4dbcd5809913156907
MD5 8e4654900d91456bc58eb064aa5a7953
BLAKE2b-256 90979e37c3d28bb151c20e8e0f21a3003c5010a5396d8eb5047bfa08663a95e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0.tar.gz:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp314-cp314t-win_arm64.whl
  • Upload date:
  • Size: 499.2 kB
  • Tags: CPython 3.14t, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-win_arm64.whl
Algorithm Hash digest
SHA256 7a8887e3aca95f1de913f31f3bb5ffcbf99ff3874ad27ce2b18dab676e38e4ec
MD5 08f6bdc8b7607a14311a2fc918edce59
BLAKE2b-256 9f6d059f58cec4a8f5f497d5297e4160c65c5049f2bcfcb4c6d4c70f4d791aa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp314-cp314t-win_amd64.whl
  • Upload date:
  • Size: 579.5 kB
  • Tags: CPython 3.14t, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-win_amd64.whl
Algorithm Hash digest
SHA256 d6805d3206e9ef9c6a094b1b39fea785a339e07d58b29eb127904dbbf2b8fd97
MD5 d2c75f023c43c56089b8ad4da2e51ceb
BLAKE2b-256 ba8ead093eae49b159d393ccbf38dbe09d067b85fa4352bfa8c7367b7734fe73

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 08183ff41e67de12e2253d30f25e1e79ae88960fcd296d3ca1ebd7759ed6d53d
MD5 e4e837b40cc9c52e182555f24e74a742
BLAKE2b-256 2173597698d0bc580d0c18b80d554cff8937d81377884645f1a7698cdc791a7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4840a500b9a2df9587a91ebe6a656d5a3f3e1def98de6b7af04cd09fa4993bdf
MD5 b3bd30d161e2ce6ebb325f8303af9e39
BLAKE2b-256 33358bd4d2f1d06a4df955645e728679a1f80ae0f98c0797d245278844684560

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1693e240a3c205c5086f2d8b4ad2786332ba9dc066a91a1099e9c233f2b736da
MD5 f2ca78839ae2596de69caa509909ac5a
BLAKE2b-256 f3970fe4068d5ccde3cf2c13fdbaea52f5aa0d32ced42946894eee08f74c7771

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314t-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314t-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 0e5d8016276baad422e002440ead5aa13464fd592270f6f1f19de5a4b01ead7c
MD5 d0e35e1463e9f53d78153cfed1d4499d
BLAKE2b-256 39b9c6d7040361f95d53e4619755265e97a2cab95d596278d2d685de336ee53e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314t-macosx_10_15_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp314-cp314-win_arm64.whl
  • Upload date:
  • Size: 498.4 kB
  • Tags: CPython 3.14, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp314-cp314-win_arm64.whl
Algorithm Hash digest
SHA256 412f10ad94f7486e805ff18868eb794ac32ce68b9cb4ae2a7169699a9e6eff97
MD5 d1b01404f80a3221108117a503b0c213
BLAKE2b-256 b2c14b46384c52c1083ec9115d6f7ecd80b9637637fdbc53c5b492c32444cd8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 576.3 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 a6dcdeaa45b529ed289f0cf0db5b6c2630a45fb485e4fecb8f017a368ae126ec
MD5 b5d4d94eadbe5625e5121bc0e3f74279
BLAKE2b-256 839ec9be573c9ed11e0bb44e8289f25b20c22af9b60bb45c869b3215f5248f1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 6e4bc9b5cd5558fd5890307374f77f196bd047e2814a255d2742473291ffa879
MD5 84fa08c7e98b09d90ef7f996d47c746a
BLAKE2b-256 d0cf14c428188979bf9ea9e493d8020f25b6414244cee3c0d64f577832fb4f44

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-musllinux_1_2_s390x.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-musllinux_1_2_s390x.whl
Algorithm Hash digest
SHA256 9608d71773465127c3437787abcfd779b8993e2aaeba0e667d1ccc74685bd5ed
MD5 ad6cef14b8aca85829e093709f28a580
BLAKE2b-256 d30507af71013537707e6b8a39ed2fd8f0908fd29b612378387c8b0753de6274

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-musllinux_1_2_s390x.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-musllinux_1_2_ppc64le.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-musllinux_1_2_ppc64le.whl
Algorithm Hash digest
SHA256 2869e9fea599cc361ca91331a1f05f0529c14e4465bf37cfc11f35d4a54b3fdf
MD5 2c4b27a9012d03210ecc5915d4b0c438
BLAKE2b-256 111b1bd6b5dc027b4c5f57011a147c5ec148e63440028890271966c6af7ca462

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-musllinux_1_2_ppc64le.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 a74c15ba9dc4efc2581a069d67e706d627806b7516f42f5c62944fc83d77bc29
MD5 adbbc49e02153dfe6e3a0a466477a5fd
BLAKE2b-256 dd2dbec18b7bc5ee2bccc04d34037cc2730d676386eaaeeb73e22e7f880649fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-musllinux_1_2_i686.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c667af2e4e85b37aaf39622193eb89e8ea1043a2adec71f4e1c897bead9191dc
MD5 c48fea4b41c705249b81322da4ba436f
BLAKE2b-256 360038f2fbfbac077730092ae39439a887fb4185f800722d82ee6584f6500c39

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl
Algorithm Hash digest
SHA256 f539dcf507f9fb1a0deec5d7bc3a8877416833aef090085f55592e46a833cad4
MD5 2f5fdeff6598ac4602a0410fd2d77c7d
BLAKE2b-256 6819050b2d10006c675bbed929d84c296cbbf011cc42f6c19e795b466501967e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 992ec99d8b3f0a78dd4051bfc1378a3ab5de3d97e4bff7c13da74968f3bc018a
MD5 8166bbe8ddde9845c0f8991ed824ee39
BLAKE2b-256 5ac0c895d1a0622edba71562d9aa0a3728752ad29960ab1fd9bd3d0c4bfdab61

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl
Algorithm Hash digest
SHA256 fe9cff146fa35345d9930f578751bffb50ec5bed6483a8d5a85ebae7777aedc5
MD5 ed5ee7e9dc1ed3f03d33faebf55b4dac
BLAKE2b-256 6b1d6414de3d9040d27af3c94cb46be24ac024df75b14af52bf1c1c1f7429c73

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ab04061d55bf84e9f6018a50960069d3e6795acd8478813caf150415615eafa1
MD5 45f872fe2bbdfa4c3e77870d391ace76
BLAKE2b-256 011238984b45e4cac77be12d96dbe8eb89af947220cc6bd789da21fbbe1109f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp314-cp314-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp314-cp314-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 e14383b87acec13447d27be0537cf2fdcfbf3340992a4c7801d365ffeb887424
MD5 3b7918e9721978f274b03df3d77d4991
BLAKE2b-256 069d95032b5c2fe79b686daa936a7a45e4b39610c763dcb9890742083efc542c

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp314-cp314-macosx_10_15_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp313-cp313t-win_arm64.whl
  • Upload date:
  • Size: 475.9 kB
  • Tags: CPython 3.13t, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-win_arm64.whl
Algorithm Hash digest
SHA256 23366534c323c498699431901cc8ce18d1ca7f542cbecdb51da97bdb4a7a33a6
MD5 e92e2abcee3b49437de8a9631303ce27
BLAKE2b-256 ab2973fe6b825b47b45c8822048c30a2c7cbb96fa276091fffc806f6c3dbddfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp313-cp313t-win_amd64.whl
  • Upload date:
  • Size: 559.9 kB
  • Tags: CPython 3.13t, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-win_amd64.whl
Algorithm Hash digest
SHA256 0d0bef62158ab226e6810e7ee5887bbccc4c39a89ddfdbfc4cdc21ce94ecfd41
MD5 bb36b0c5bea731636686e1f4c81d9457
BLAKE2b-256 6db2330514a250fa443c365af548031801ae7eba09a3b58e895499dc0444a096

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b25709d9fb3c5c23ddae627eb0d725b0339d3a46de5eaa81d63c6a4202b6f2c9
MD5 abb12b760b62c5d3df2b33294cfb0428
BLAKE2b-256 4a55cb234d1e711c0ae7613564d4d9ca62e549952d436dd949318154347a0086

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b241fc2782e3f82be571a7a3cb6556ceda6159f7311af9c1d0feb50330de1a88
MD5 655a9d76958df196a3da702c7e19fcae
BLAKE2b-256 318987ed9a1f7d471ebeecf70aac3acc388f56efe0acd286016e57905ca998d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e45025f5e6db761e468970ec8757b5d24efc2de202a9785a1dd2bbfb928d85a7
MD5 50bcf41d4aec180f8423e16390813cf7
BLAKE2b-256 73272513bad301cf8d43c5c4e7c7b2c573b6cdc3c8fe8cae56bc267ba57bfdac

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313t-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313t-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 417bcbbc9296f268ec5247de5cbe8fb343420ef41235e738a0dabc6013c6335c
MD5 6a2ebe702bad5ecc6ea838df21de688d
BLAKE2b-256 d809a2cc2521ecb7a18e52892a236fd9e5469d5f7aa64ae11d40ef0ee13db2ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313t-macosx_10_13_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp313-cp313-win_arm64.whl
  • Upload date:
  • Size: 475.4 kB
  • Tags: CPython 3.13, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp313-cp313-win_arm64.whl
Algorithm Hash digest
SHA256 ae042266784a23c747f91a5408f147524ab23f547bc534c974bbac99d36ed29f
MD5 cef3f48f5f1a10a33a9da814f4e4ef13
BLAKE2b-256 50f64c00a274d03b6f1603dd6ca042296845efd0517c93aebfb610e794e77b54

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 557.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 68d07ffe5a0a8f8a54bed1366e0c3ab112e04212b6cd7394888374c334bdddbd
MD5 f3b88c5477d150e89999a2b2f660fbe0
BLAKE2b-256 4fa4f4be235e3f57cbbae75ada7cbe8fbcfc8a01da46f1a6766d9c7360619746

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 3b4a4e4ae09c73c284cadc6b06d87598446bd2d3d39449343cfea0dffeca9749
MD5 e633ef3bf47b1ed8ab09aaa5366ee129
BLAKE2b-256 9f08b9a3010ba97d5c32355511e8055fcb09c1f34fc279b0c6cbc6b6e6fa7ff6

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 df69e606dee1e6dd8d149c5f23e6f78b29c920850f66aed40719832cbca9f8e7
MD5 ad3387ddf49fa915174ceb80d7c59da8
BLAKE2b-256 2a49efcbd0d8bbf5e0a827009fefd1855d924952df3da69b134c138475dd20ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bea3e3ba291aea7ca335dfc6190f6e7913c87f2c47fe1b8097b2917509d52afc
MD5 7dfba07eafe7e45edf7318d9e958d949
BLAKE2b-256 15e7d2b64b10f07fb7080f4c56f486f442cbbc5d2bd4dd71037efe6c4fad3653

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 8043d15093787be65e9465345dabf06b64d15316a52d0066b2602ad1662f8d92
MD5 34a483655b1dfad6ff050b38a176a92c
BLAKE2b-256 08eb69d73072d5034f5f08a03f6cbe885d422b538132c154140217ac61972c1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp312-cp312-win_arm64.whl
  • Upload date:
  • Size: 475.3 kB
  • Tags: CPython 3.12, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp312-cp312-win_arm64.whl
Algorithm Hash digest
SHA256 78df6f16a903739c07b52b4ecaa604c17607b42fc0c6bbad96ab7b901b62955f
MD5 3e718e96e9e8165f8fa4d3bad1d19a96
BLAKE2b-256 e1a1b98e96d7303590e034f08336c69919059fa2f42ae582238f2df11dc2fa07

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 557.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 053b60f14e67e8753518238ed41943ccfe4961404d57b97636de990bd4ecd5d1
MD5 8c795025b71ea6d291abf1fe59ab602c
BLAKE2b-256 13681e07ef25bd59c97b4452ba304ca53a6f57d068fed6032209d0d2be3be335

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 595000807fc1dd1f8564aec3726f03493439847b29cebc03a7c7b63cd435d24c
MD5 1c83692d269a2d629f2092d37f5bcb91
BLAKE2b-256 494c6a4a74cf14f2a9142d8e6b0ef77791ef8923f05e932088f721e3d88fcc22

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d64147b2ce1104ba856991dbc7ef31683576c4c379f3f3f5fbe1d38bf0db458e
MD5 c76977484c1f7db631db0dd955c35ded
BLAKE2b-256 b337ad249cc893d69ed8dfe6404aee1ab214df8303517b404fc39b1549b78b79

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d82904092e52e82f45c0f2f0472050d38539e3ec2e68f2fddd484d9788f802e
MD5 27d3ebbd01dfc28d51a5bb6b0b4e19c1
BLAKE2b-256 26617c8a411971634c5f832d4de60e5734181512733ed312630a9a7c0864c998

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 099e5ad0ed9bc00881f2bc5bfe2a411432c26b7f56282d081100495dc0fae7a4
MD5 36ed4087611826244c48d44afdac8c45
BLAKE2b-256 ef02572a189014e34215336f1dddf8f390fc2ef777206201aa34ec38c805285a

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp311-cp311-win_arm64.whl
  • Upload date:
  • Size: 475.2 kB
  • Tags: CPython 3.11, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp311-cp311-win_arm64.whl
Algorithm Hash digest
SHA256 01b24b0ecf6adb366ea99035e5c49257cd6f8b0ac9d7ade0231950703aaacc92
MD5 1a93d9e021c8f78b59a104536f2739bf
BLAKE2b-256 d0c564c0b4f71b88045e153f0c027472915864cb8e137545dc931d17ee03b3ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 556.9 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6af1b3a799dd6134167e09a4ee5fafe668f10a9dbeb60599439352994fe173b7
MD5 50db7a8337f5e3fe54a8fc01b3975304
BLAKE2b-256 fea1cf905199c305b8ab4dda4ed793cf96da7096333bb9f7548385e40dc69f6e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 90ec7949f1ac23d2a0fed82728fcd3a2c918956635fc508518c0becfe9dbdca5
MD5 a0dcbacf5e4a637099cf6676c2efb6b5
BLAKE2b-256 52074c7b6b289c4b970ee9dd82e2393f123a8f164835cca6143ca65c39efcd77

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2aee2c869a8181a2bf4b072795f61cb2cf48ffab4783d8013a64e9fdb09ac42e
MD5 36d0704c2d8f663b45dadb13bdc286b1
BLAKE2b-256 5cc0aabcc5753278b7221f64c4ce8c400694025be22e993f1b5f1f602fc7433e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e770acc6a548e01bf7517d623f1e667cccb608e880ec1875728601c6ed9d4972
MD5 bb925ea3ba398b937ffefdf7ce4683fe
BLAKE2b-256 9cc898310002d1fceb3b3bcbea26c0c1e87c0dea4fdce1761e1bef8794c26d40

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bfc2a53c8497b53156e0daa67949c6597d9a1e38892332f12e8cfc310949b719
MD5 db2c65d40ec9ac59f5065deede3d624d
BLAKE2b-256 57ed35414bc56b82949814cba9b330d2a04f195c733379202ee1763d0731c087

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-win_arm64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp310-cp310-win_arm64.whl
  • Upload date:
  • Size: 475.3 kB
  • Tags: CPython 3.10, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp310-cp310-win_arm64.whl
Algorithm Hash digest
SHA256 c7b78cbedafe3697198bf6374fb98a33af64982f3556fef4b9f2388776217b7e
MD5 a6427c260ccace9dff9bd509810d2176
BLAKE2b-256 be4ef3e7ade03334c7739a7d284f697e54e95c579d9f96b64593202bea168f5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-win_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: numkong-7.3.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 557.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numkong-7.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8e85fdcee698f6cb0ed4b04de683deaeedf943ea6e472f2315ee3ef2e3a9b412
MD5 66e01d9f537108c97c83f762905622ae
BLAKE2b-256 eb760fd8b9a039fa2fa03f62d8bdc833512f2956cc62e47cdc781f78bd0047ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-win_amd64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2b6bc9ef77b1a1e8bbed50126de605409421e2dc4b92bb22bdb654167ba86432
MD5 6e0a1b6ab4997f79519213a90c23252e
BLAKE2b-256 4e40b8934b64d98a91ff31f5bd4491e8bd96aac034c47df514fe86b8317ffcfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-musllinux_1_2_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a46410637c9864ecd03f31457a38a8c1b7ba0cccd046c56a3a25af3dcc5dde85
MD5 a3fff95224d14c99ed5342241fe2d8e1
BLAKE2b-256 7e822eac4f5b3a5f5fc6d46ec99620643a5fdf0bceb87e3cc40824c0368a6331

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 040b60c6e61723390e9186fcb3d677004c6ce5060ec208ffe1c411ce190e9373
MD5 5d32661db8cc6513986a1d9aeac9ae27
BLAKE2b-256 77363b013fc55d96816f7e9790206a6024c1c637a301b26413288a25e3a3cf3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numkong-7.3.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for numkong-7.3.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b5d25cdc8ee5358a2c652ad777e1556f942b957f84e9823926896de297bcb0c5
MD5 b7f3bf33400f0e7a81be3e23a8be6583
BLAKE2b-256 c513e6944a683ca3424d1c52d66cfbdff616cbbd67b7d73ac87704a827131a9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for numkong-7.3.0-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: release-python.yml on ashvardanian/NumKong

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page