Scalable approximate nearest neighbor search for variable-length binary bit-vectors using NPHD metric.
Project description
iscc-usearch
Scalable approximate nearest neighbor search for variable-length binary bit-vectors.
iscc-usearch extends USearch with capabilities
purpose-built for ISCC (ISO 24138) content fingerprints: indexing
binary vectors of mixed bit-lengths in a single index, and scaling beyond available RAM through
transparent sharding.
Why not plain USearch?
USearch is a fast, general-purpose vector index -- but it assumes all vectors have the same dimensionality, and a single index must fit in memory for writes. ISCC codes break both assumptions:
-
Variable-length codes. An ISCC content fingerprint can be 64, 128, or 256 bits depending on resolution. Shorter codes are prefixes of longer ones -- a design shared with Matryoshka Representation Learning. A useful index must store and compare all resolutions together.
-
Large-scale collections. Real-world content registries grow to hundreds of millions of fingerprints. Write throughput in HNSW graphs degrades as the graph grows, and the full graph must be loaded into RAM for inserts.
iscc-usearch solves both problems with two core additions:
Normalized Prefix Hamming Distance (NPHD) compares only the bits that both vectors share and
normalizes the result to [0.0, 1.0]. A 64-bit query can find its nearest neighbors among
256-bit vectors -- distances remain comparable across resolutions.
Transparent sharding keeps a single active shard in RAM for writes while completed shards are memory-mapped for reads. This maintains consistent insert throughput regardless of index size and keeps the memory footprint bounded.
Installation
pip install iscc-usearch
Quick Start
import numpy as np
from iscc_usearch import NphdIndex
index = NphdIndex(max_dim=256)
# Mix 64-bit and 128-bit vectors in the same index
index.add(1, np.array([255, 128, 64, 32, 16, 8, 4, 2], dtype=np.uint8))
index.add(2, np.array([255, 128, 64, 32, 16, 8, 4, 2, 1, 0, 255, 128, 64, 32, 16, 8], dtype=np.uint8))
# Search with a 64-bit query -- NPHD compares the common prefix
query = np.array([255, 128, 64, 32, 16, 8, 4, 2], dtype=np.uint8)
matches = index.search(query, count=2)
print(matches.keys) # Nearest neighbor keys
print(matches.distances) # NPHD distances in [0.0, 1.0]
Documentation
Full documentation: https://iscc.github.io/iscc-usearch/
- Tutorials -- Step-by-step getting started guides
- How-to Guides -- Persistence, sharding, upsert, bloom filters
- Explanation -- NPHD metric, architecture, performance
- API Reference -- Auto-generated from source
- Development -- Dev setup, testing, and contribution guidelines
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iscc_usearch-0.1.0.tar.gz.
File metadata
- Download URL: iscc_usearch-0.1.0.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf4caabbf909fafea9a5ee65820adaa18ca3d7c65882ae886764d5c6b09160db
|
|
| MD5 |
fa1c3e815cf43ed4e00ce651460fa70d
|
|
| BLAKE2b-256 |
935c902fc103600123bf219a2f2d412381f0271e36f223aaabb87a5f04f2bb72
|
File details
Details for the file iscc_usearch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iscc_usearch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba30f7c6960f33ae222c6246a9896f12bb89db65d6b4427a6c169e76b83aa25a
|
|
| MD5 |
e9ca4bd8473c62c0605f772479f1e399
|
|
| BLAKE2b-256 |
5afdabeffa8a38b3ed75cfc4a0175428d0bf65e3f5ecca2dfe4e394d3c400c12
|