Skip to main content

The fastest exact Hamming-distance pair-search library on PyPI. Pigeonhole prefilter, Rust core, streaming Index with sub-µs query latency.

Project description

hammingbird

The fastest exact Hamming-distance pair-search library on PyPI. Same algorithm as FAISS IndexBinaryMultiHash (Norouzi 2012), with a Rust core, rayon-parallel candidate generation, prefetched popcount verify, and a streaming Index with sub-microsecond query latency.

pip install hammingbird
import numpy as np
from hammingbird import find_pairs_self, Index

# 1M random 256-bit codes
A = np.random.default_rng(0).integers(0, 256, size=(1_000_000, 32), dtype=np.uint8)

# All-pairs near-duplicate search — exact, 100% recall
pairs = find_pairs_self(A, k=2)     # list of (i, j, hamming_dist)

# Streaming index — sub-microsecond per query
idx = Index(d_bytes=32, k=2)
idx.add_batch(A)
hits = idx.query(A[0])              # list of (id, hamming_dist)

Headline numbers (Apple M2 Pro, full reproducibility on GitHub)

Workload hammingbird FAISS IndexBinaryFlat Speedup
n=2,000,000, k=0 (exact dedup) 0.196 s 798.4 s 4073×
n=2,000,000, k=2 (uniform random) 0.312 s 919.3 s 2945×
n=100k Index query latency 0.21 µs median 194 µs median 924× per query
n=100k clustered, k=4 (real-shape) 0.040 s 1.755 s 44×

vs FAISS IndexBinaryMultiHash (same algorithm) at full recall: 3–45× faster. vs usearch / Annoy / FAISS-LSH at clustered n=1M: strict dominance — hammingbird is both faster AND more accurate (those libraries' recall drops below 12% at scale on this workload).

Full head-to-head benchmark report (16 demo presets, every number reproducible): https://github.com/ElVec1o/hammingbird/blob/main/BENCHMARK_RESULTS.md

What it's for

  • Exact dedup at scale (k=0 — the common production workload)
  • Near-duplicate detection on perceptual hashes, SimHash, learned binary embeddings (k ≤ 8 typically)
  • Real-time content moderation — the streaming Index API delivers ~1.6M exact queries/sec per thread, releases the GIL on add_batch
  • Cross-corpus matchingfind_pairs_cross(A, B, k) for "dedup incoming batch against blacklist" workloads

What it's not

  • Not a general-purpose ANN library. For approximate-recall workloads at k ≥ 16 or single-query latency budgets < 1 µs with k ≥ 8, FAISS's HNSW family or usearch may be a better fit.
  • Not algorithmically novel. The pigeonhole prefilter is published (Norouzi-Punjani-Fleet 2012). The win is implementation quality.

License

MIT © 2026 Vico Bonfioli.

Source / issues / docs

https://github.com/ElVec1o/hammingbird

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hammingbird-0.5.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hammingbird-0.5.0-cp39-abi3-macosx_11_0_arm64.whl (681.8 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file hammingbird-0.5.0.tar.gz.

File metadata

  • Download URL: hammingbird-0.5.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for hammingbird-0.5.0.tar.gz
Algorithm Hash digest
SHA256 d0477136eb6fdfc8f6ff251272c404131fa789852eade7d0fbeb18003e681dec
MD5 c735e0b6f7858df24e1ea419a3f02e11
BLAKE2b-256 188393c9fe5cc5bf908719d240113745372d74581301e7213430cdee2daf3a94

See more details on using hashes here.

File details

Details for the file hammingbird-0.5.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hammingbird-0.5.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f342dd1b3f8a4de41350bb17a6b518cf9a9ecfed39768b45a7937d02233f1e36
MD5 77034dd333e31414a8f1a4784dd384aa
BLAKE2b-256 e592980b1c6f84554e9f69e4ec8f30c8153620ffd870c2e076c4b1336edbf1c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page