A minimal, easy-to-understand FAISS-like library for vector similarity search with SIMD acceleration
Project description
mini-faiss: A Minimal Vector Similarity Search Library
A lightweight, high-performance library for similarity search in dense float32 embeddings. Built in C++ with SIMD acceleration and Python bindings via pybind11.
Key characteristics:
- Fast: SIMD-accelerated distance computation (L2 and inner product)
- Simple: ~1500 lines of clean C++ code, easy to understand and extend
- Pythonic: NumPy-friendly API, installable via pip
- Focused: Solves the core problem of similarity search without unnecessary complexity
Suitable for indexing and searching small to medium-scale datasets (up to a few million vectors on single machines).
Installation
From Source
git clone https://github.com/yourusername/mini-faiss.git
cd mini-faiss
# Install dependencies
pip install pybind11 scikit-build-core numpy
# Build and install
pip install .
Requirements
- C++17 compiler (GCC, Clang, or MSVC)
- Python 3.8+
- NumPy 1.20+
Quick Start
import numpy as np
from mini_faiss import IndexFlatL2
# Create index for 768-dimensional vectors
d = 768
index = IndexFlatL2(d)
# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)
# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)
print(distances.shape) # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape) # (5, 10)
API Reference
IndexFlatL2
Flat index using squared Euclidean (L2) distance.
from mini_faiss import IndexFlatL2
# Create index
index = IndexFlatL2(d=768)
# Add vectors
index.add(xb) # xb: (N, 768) float32 array
# Search
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of L2 distances
# indices: (nq, k) int64 array of vector indices
Parameters:
d(int): Vector dimensionxb(np.ndarray): Vectors to add, shape(N, d), dtypefloat32xq(np.ndarray): Query vectors, shape(nq, d), dtypefloat32k(int): Number of nearest neighbors to return
Methods:
add(xb): Add vectors to indexsearch(xq, k): Return top-k nearest neighborsntotal(): Get total number of indexed vectorsdimension(): Get vector dimension
IndexFlatIP
Flat index using inner product distance (useful for cosine similarity when vectors are normalized).
from mini_faiss import IndexFlatIP
# Create index
index = IndexFlatIP(d=768)
# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)
# Add and search
index.add(xb)
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of inner products (higher = more similar)
Architecture
C++ Core
The library implements three main components:
-
Data Structures (
BaseIndex,IndexFlatL2,IndexFlatIP):- Flat, row-major storage of vectors in
std::vector<float> - Metadata: dimension, vector count
- Abstract interface for index implementations
- Flat, row-major storage of vectors in
-
Distance Kernels:
l2_distance_batch(): Compute squared Euclidean distancesinner_product_batch(): Compute dot products- Vectorizable loops designed for SIMD auto-vectorization
-
Top-K Selection:
select_topk_min(): Keep k smallest distances (for L2)select_topk_max(): Keep k largest distances (for inner product)- Heap-based selection per query
Memory Layout
Vectors are stored in row-major layout:
data = [v_0[0], v_0[1], ..., v_0[d-1],
v_1[0], v_1[1], ..., v_1[d-1],
...
v_{N-1}[0], ..., v_{N-1}[d-1]]
Access: data[i*d + j] = j-th dimension of i-th vector.
Distance Computation
L2 (squared Euclidean):
||q - db||^2 = ||q||^2 - 2*q·db + ||db||^2
Precomputes database norms for efficiency.
Inner Product:
score = q · db
Simple dot product, maximized for similarity.
Performance
Benchmarks on typical hardware (Intel i7, single core):
| Database Size | Dimension | mini-faiss | NumPy | Speedup |
|---|---|---|---|---|
| 10,000 | 128 | 0.0012s | 0.0085s | ~7x |
| 100,000 | 128 | 0.012s | 0.086s | ~7x |
| 100,000 | 768 | 0.067s | 0.48s | ~7x |
Performance depends on:
- Hardware (CPU model, cache size, SIMD capabilities)
- Compiler optimization flags
- Vector dimension and dataset size
To build with maximum optimization:
# Automatic via scikit-build-core, but manually:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O3 -march=native" .
cmake --build . --config Release
Use Cases
- Dense retrieval: Find similar documents/passages
- Image search: Index embeddings from image models
- Recommendation systems: Compute nearest neighbors for recommendations
- Clustering: Find nearest neighbors for clustering algorithms
- Prototyping: Quick iteration before moving to FAISS/specialized tools
Limitations
- Fixed dimension: All vectors in an index must have the same dimension
- Append-only: No deletion or update of indexed vectors
- Brute force search: Linear scan of all vectors (no approximate methods)
- Single machine: No distributed indexing
- Memory: Limited by RAM (typical: ~1 million 768-dim vectors ≈ 3GB)
Design Decisions
Why Heap-Based Top-K?
For small k (typical case: k << N), a max-heap approach is simple and efficient:
- Time: O(N log k) per query
- Space: O(k) temporary storage
- Alternative
std::partial_sorthas similar complexity
Why No Quantization?
V1 goal: simplicity and correctness. Quantization adds complexity:
- Product quantization (PQ)
- Scalar quantization Can be added in future versions without changing API.
Why Row-Major Layout?
- Aligns with NumPy's default (C-contiguous)
- Natural for dense vector operations
- Cache-friendly for batch distance computation
Testing
Run the test suite:
pip install pytest
pytest tests/
Tests verify:
- Correctness vs. NumPy reference implementations
- Input validation and error handling
- Output shapes and dtypes
- Edge cases (empty index, k > N, etc.)
Examples
See examples/:
basic_usage.py: Core functionality demobenchmark.py: Performance comparison with NumPy
Run examples:
python examples/basic_usage.py
python examples/benchmark.py
Future Enhancements
Possible extensions (outside v1 scope):
-
Index Types:
- Inverted file (IVF) for coarse-grained partitioning
- Product quantization (PQ)
- Hierarchical Navigable Small Worlds (HNSW)
-
Operations:
- Batch addition with progress tracking
- Vector deletion / index rebuild
- Index serialization (save/load)
-
Features:
- GPU acceleration (CUDA, Metal)
- Explicit vector IDs (not just implicit 0..N-1)
- Custom distance functions
-
Optimization:
- Cache-oblivious algorithms
- SIMD assembly kernels
- Multi-threaded search
Debugging
Enable Debug Output
Compile with debug symbols:
cmake -DCMAKE_BUILD_TYPE=Debug .
Common Issues
ImportError: cannot import name 'IndexFlatL2'
- Ensure library built correctly:
pip install -v . - Check Python version matches build:
python --version
Dimension mismatch error
- Verify query/database vector shapes match:
assert xq.shape[1] == index.dimension()
Memory error with large datasets
- Check available RAM
- Reduce dataset size or use smaller dimension
- Consider adding index types with compression in future
Contributing
Contributions welcome! Areas:
- Performance improvements
- Additional index types
- GPU acceleration
- Better documentation
- Bug reports and fixes
License
MIT License. See LICENSE file.
References
- FAISS - Large-scale similarity search
- pybind11 - C++/Python bindings
- NumPy - Numerical computing in Python
Citation
If you use mini-faiss in research, cite as:
@software{mini_faiss_2024,
title={mini-faiss: A Minimal Vector Similarity Search Library},
author={Your Name},
year={2024},
url={https://github.com/yourusername/mini-faiss}
}
Last updated: 2024
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.
File metadata
- Download URL: mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
- Upload date:
- Size: 77.8 kB
- Tags: CPython 3.14, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
278dbf69688d7857bc16ff7879b166f14a6b8fc0c285b3d033b4bd65d453b28d
|
|
| MD5 |
0576a057c2a904d652ecfb0eb7bb977b
|
|
| BLAKE2b-256 |
1264e59ca1780641bea3092e10a3423fdeb1618d59a48948403e50c883eb5d36
|