The fastest exact Hamming-distance pair-search library on PyPI. Pigeonhole prefilter, Rust core, streaming Index with sub-µs query latency.
Project description
hammingbird
The fastest exact Hamming-distance pair-search library on PyPI. Same algorithm as FAISS
IndexBinaryMultiHash(Norouzi 2012), with a Rust core, rayon-parallel candidate generation, prefetched popcount verify, and a streamingIndexwith sub-microsecond query latency.
pip install hammingbird
import numpy as np
from hammingbird import find_pairs_self, Index
# 1M random 256-bit codes
A = np.random.default_rng(0).integers(0, 256, size=(1_000_000, 32), dtype=np.uint8)
# All-pairs near-duplicate search — exact, 100% recall
pairs = find_pairs_self(A, k=2) # list of (i, j, hamming_dist)
# Streaming index — sub-microsecond per query
idx = Index(d_bytes=32, k=2)
idx.add_batch(A)
hits = idx.query(A[0]) # list of (id, hamming_dist)
Headline numbers (Apple M2 Pro, full reproducibility on GitHub)
| Workload | hammingbird | FAISS IndexBinaryFlat | Speedup |
|---|---|---|---|
| n=2,000,000, k=0 (exact dedup) | 0.196 s | 798.4 s | 4073× |
| n=2,000,000, k=2 (uniform random) | 0.312 s | 919.3 s | 2945× |
| n=100k Index query latency | 0.21 µs median | 194 µs median | 924× per query |
| n=100k clustered, k=4 (real-shape) | 0.040 s | 1.755 s | 44× |
vs FAISS IndexBinaryMultiHash (same algorithm) at full recall: 3–45× faster.
vs usearch / Annoy / FAISS-LSH at clustered n=1M: strict dominance —
hammingbird is both faster AND more accurate (those libraries' recall
drops below 12% at scale on this workload).
Full head-to-head benchmark report (16 demo presets, every number reproducible): https://github.com/ElVec1o/hammingbird/blob/main/BENCHMARK_RESULTS.md
What it's for
- Exact dedup at scale (k=0 — the common production workload)
- Near-duplicate detection on perceptual hashes, SimHash, learned binary embeddings (k ≤ 8 typically)
- Real-time content moderation — the streaming
IndexAPI delivers ~1.6M exact queries/sec per thread, releases the GIL onadd_batch - Cross-corpus matching —
find_pairs_cross(A, B, k)for "dedup incoming batch against blacklist" workloads
What it's not
- Not a general-purpose ANN library. For approximate-recall workloads at k ≥ 16 or single-query latency budgets < 1 µs with k ≥ 8, FAISS's HNSW family or usearch may be a better fit.
- Not algorithmically novel. The pigeonhole prefilter is published (Norouzi-Punjani-Fleet 2012). The win is implementation quality.
License
MIT © 2026 Vico Bonfioli.
Source / issues / docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hammingbird-0.5.0.tar.gz.
File metadata
- Download URL: hammingbird-0.5.0.tar.gz
- Upload date:
- Size: 29.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0477136eb6fdfc8f6ff251272c404131fa789852eade7d0fbeb18003e681dec
|
|
| MD5 |
c735e0b6f7858df24e1ea419a3f02e11
|
|
| BLAKE2b-256 |
188393c9fe5cc5bf908719d240113745372d74581301e7213430cdee2daf3a94
|
File details
Details for the file hammingbird-0.5.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: hammingbird-0.5.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 681.8 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f342dd1b3f8a4de41350bb17a6b518cf9a9ecfed39768b45a7937d02233f1e36
|
|
| MD5 |
77034dd333e31414a8f1a4784dd384aa
|
|
| BLAKE2b-256 |
e592980b1c6f84554e9f69e4ec8f30c8153620ffd870c2e076c4b1336edbf1c7
|