Skip to main content

Scalable approximate nearest neighbor search for variable-length binary bit-vectors using NPHD metric.

Project description

iscc-usearch

Tests Python License Ask DeepWiki

Scalable approximate nearest neighbor search for variable-length binary bit-vectors.

iscc-usearch extends USearch with capabilities purpose-built for ISCC (ISO 24138) content fingerprints: indexing binary vectors of mixed bit-lengths in a single index, and scaling beyond available RAM through transparent sharding.

Why not plain USearch?

USearch is a fast, general-purpose vector index -- but it assumes all vectors have the same dimensionality, and a single index must fit in memory for writes. ISCC codes break both assumptions:

  • Variable-length codes. An ISCC content fingerprint can be 64, 128, or 256 bits depending on resolution. Shorter codes are prefixes of longer ones -- a design shared with Matryoshka Representation Learning. A useful index must store and compare all resolutions together.

  • Large-scale collections. Real-world content registries grow to hundreds of millions of fingerprints. Write throughput in HNSW graphs degrades as the graph grows, and the full graph must be loaded into RAM for inserts.

iscc-usearch solves both problems with two core additions:

Normalized Prefix Hamming Distance (NPHD) compares only the bits that both vectors share and normalizes the result to [0.0, 1.0]. A 64-bit query can find its nearest neighbors among 256-bit vectors -- distances remain comparable across resolutions.

Transparent sharding keeps a single active shard in RAM for writes while completed shards are memory-mapped for reads. This maintains consistent insert throughput regardless of index size and keeps the memory footprint bounded.

Installation

pip install iscc-usearch

Quick Start

import numpy as np
from iscc_usearch import NphdIndex

index = NphdIndex(max_dim=256)

# Mix 64-bit and 128-bit vectors in the same index
index.add(1, np.array([255, 128, 64, 32, 16, 8, 4, 2], dtype=np.uint8))
index.add(2, np.array([255, 128, 64, 32, 16, 8, 4, 2, 1, 0, 255, 128, 64, 32, 16, 8], dtype=np.uint8))

# Search with a 64-bit query -- NPHD compares the common prefix
query = np.array([255, 128, 64, 32, 16, 8, 4, 2], dtype=np.uint8)
matches = index.search(query, count=2)

print(matches.keys)  # Nearest neighbor keys
print(matches.distances)  # NPHD distances in [0.0, 1.0]

Documentation

Full documentation: https://iscc.github.io/iscc-usearch/

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_usearch-0.1.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iscc_usearch-0.1.0-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file iscc_usearch-0.1.0.tar.gz.

File metadata

  • Download URL: iscc_usearch-0.1.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for iscc_usearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bf4caabbf909fafea9a5ee65820adaa18ca3d7c65882ae886764d5c6b09160db
MD5 fa1c3e815cf43ed4e00ce651460fa70d
BLAKE2b-256 935c902fc103600123bf219a2f2d412381f0271e36f223aaabb87a5f04f2bb72

See more details on using hashes here.

File details

Details for the file iscc_usearch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iscc_usearch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for iscc_usearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba30f7c6960f33ae222c6246a9896f12bb89db65d6b4427a6c169e76b83aa25a
MD5 e9ca4bd8473c62c0605f772479f1e399
BLAKE2b-256 5afdabeffa8a38b3ed75cfc4a0175428d0bf65e3f5ecca2dfe4e394d3c400c12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page