A minimal, easy-to-understand FAISS-like library for vector similarity search with SIMD acceleration

These details have not been verified by PyPI

Project links

Project description

mini-faiss: A Minimal Vector Similarity Search Library

A lightweight, high-performance library for similarity search in dense float32 embeddings. Built in C++ with SIMD acceleration and Python bindings via pybind11.

Key characteristics:

Fast: SIMD-accelerated distance computation (L2 and inner product)
Simple: ~1500 lines of clean C++ code, easy to understand and extend
Pythonic: NumPy-friendly API, installable via pip
Focused: Solves the core problem of similarity search without unnecessary complexity

Suitable for indexing and searching small to medium-scale datasets (up to a few million vectors on single machines).

Installation

From Source

git clone https://github.com/yourusername/mini-faiss.git
cd mini-faiss

# Install dependencies
pip install pybind11 scikit-build-core numpy

# Build and install
pip install .

Requirements

C++17 compiler (GCC, Clang, or MSVC)
Python 3.8+
NumPy 1.20+

Quick Start

import numpy as np
from mini_faiss import IndexFlatL2

# Create index for 768-dimensional vectors
d = 768
index = IndexFlatL2(d)

# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)

# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)

print(distances.shape)  # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape)    # (5, 10)

API Reference

IndexFlatL2

Flat index using squared Euclidean (L2) distance.

from mini_faiss import IndexFlatL2

# Create index
index = IndexFlatL2(d=768)

# Add vectors
index.add(xb)  # xb: (N, 768) float32 array

# Search
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of L2 distances
# indices: (nq, k) int64 array of vector indices

Parameters:

d (int): Vector dimension
xb (np.ndarray): Vectors to add, shape (N, d), dtype float32
xq (np.ndarray): Query vectors, shape (nq, d), dtype float32
k (int): Number of nearest neighbors to return

Methods:

add(xb): Add vectors to index
search(xq, k): Return top-k nearest neighbors
ntotal(): Get total number of indexed vectors
dimension(): Get vector dimension

IndexFlatIP

Flat index using inner product distance (useful for cosine similarity when vectors are normalized).

from mini_faiss import IndexFlatIP

# Create index
index = IndexFlatIP(d=768)

# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)

# Add and search
index.add(xb)
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of inner products (higher = more similar)

Architecture

C++ Core

The library implements three main components:

Data Structures (BaseIndex, IndexFlatL2, IndexFlatIP):
- Flat, row-major storage of vectors in std::vector<float>
- Metadata: dimension, vector count
- Abstract interface for index implementations
Distance Kernels:
- l2_distance_batch(): Compute squared Euclidean distances
- inner_product_batch(): Compute dot products
- Vectorizable loops designed for SIMD auto-vectorization
Top-K Selection:
- select_topk_min(): Keep k smallest distances (for L2)
- select_topk_max(): Keep k largest distances (for inner product)
- Heap-based selection per query

Memory Layout

Vectors are stored in row-major layout:

data = [v_0[0], v_0[1], ..., v_0[d-1],
        v_1[0], v_1[1], ..., v_1[d-1],
        ...
        v_{N-1}[0], ..., v_{N-1}[d-1]]

Access: data[i*d + j] = j-th dimension of i-th vector.

Distance Computation

L2 (squared Euclidean):

||q - db||^2 = ||q||^2 - 2*q·db + ||db||^2

Precomputes database norms for efficiency.

Inner Product:

score = q · db

Simple dot product, maximized for similarity.

Performance

Benchmarks on typical hardware (Intel i7, single core):

Database Size	Dimension	mini-faiss	NumPy	Speedup
10,000	128	0.0012s	0.0085s	~7x
100,000	128	0.012s	0.086s	~7x
100,000	768	0.067s	0.48s	~7x

Performance depends on:

Hardware (CPU model, cache size, SIMD capabilities)
Compiler optimization flags
Vector dimension and dataset size

To build with maximum optimization:

# Automatic via scikit-build-core, but manually:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O3 -march=native" .
cmake --build . --config Release

Use Cases

Dense retrieval: Find similar documents/passages
Image search: Index embeddings from image models
Recommendation systems: Compute nearest neighbors for recommendations
Clustering: Find nearest neighbors for clustering algorithms
Prototyping: Quick iteration before moving to FAISS/specialized tools

Limitations

Fixed dimension: All vectors in an index must have the same dimension
Append-only: No deletion or update of indexed vectors
Brute force search: Linear scan of all vectors (no approximate methods)
Single machine: No distributed indexing
Memory: Limited by RAM (typical: ~1 million 768-dim vectors ≈ 3GB)

Design Decisions

Why Heap-Based Top-K?

For small k (typical case: k << N), a max-heap approach is simple and efficient:

Time: O(N log k) per query
Space: O(k) temporary storage
Alternative std::partial_sort has similar complexity

Why No Quantization?

V1 goal: simplicity and correctness. Quantization adds complexity:

Product quantization (PQ)
Scalar quantization Can be added in future versions without changing API.

Why Row-Major Layout?

Aligns with NumPy's default (C-contiguous)
Natural for dense vector operations
Cache-friendly for batch distance computation

Testing

Run the test suite:

pip install pytest
pytest tests/

Tests verify:

Correctness vs. NumPy reference implementations
Input validation and error handling
Output shapes and dtypes
Edge cases (empty index, k > N, etc.)

Examples

See examples/:

basic_usage.py: Core functionality demo
benchmark.py: Performance comparison with NumPy

Run examples:

python examples/basic_usage.py
python examples/benchmark.py

Future Enhancements

Possible extensions (outside v1 scope):

Index Types:
- Inverted file (IVF) for coarse-grained partitioning
- Product quantization (PQ)
- Hierarchical Navigable Small Worlds (HNSW)
Operations:
- Batch addition with progress tracking
- Vector deletion / index rebuild
- Index serialization (save/load)
Features:
- GPU acceleration (CUDA, Metal)
- Explicit vector IDs (not just implicit 0..N-1)
- Custom distance functions
Optimization:
- Cache-oblivious algorithms
- SIMD assembly kernels
- Multi-threaded search

Debugging

Enable Debug Output

Compile with debug symbols:

cmake -DCMAKE_BUILD_TYPE=Debug .

Common Issues

ImportError: cannot import name 'IndexFlatL2'

Ensure library built correctly: pip install -v .
Check Python version matches build: python --version

Dimension mismatch error

Verify query/database vector shapes match: assert xq.shape[1] == index.dimension()

Memory error with large datasets

Check available RAM
Reduce dataset size or use smaller dimension
Consider adding index types with compression in future

Contributing

Contributions welcome! Areas:

Performance improvements
Additional index types
GPU acceleration
Better documentation
Bug reports and fixes

License

MIT License. See LICENSE file.

References

FAISS - Large-scale similarity search
pybind11 - C++/Python bindings
NumPy - Numerical computing in Python

Citation

If you use mini-faiss in research, cite as:

@software{mini_faiss_2024,
  title={mini-faiss: A Minimal Vector Similarity Search Library},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/mini-faiss}
}

Last updated: 2024

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl (77.8 kB view details)

Uploaded Dec 3, 2025 CPython 3.14macOS 15.0+ ARM64

File details

Details for the file mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

Download URL: mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Upload date: Dec 3, 2025
Size: 77.8 kB
Tags: CPython 3.14, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`278dbf69688d7857bc16ff7879b166f14a6b8fc0c285b3d033b4bd65d453b28d`
MD5	`0576a057c2a904d652ecfb0eb7bb977b`
BLAKE2b-256	`1264e59ca1780641bea3092e10a3423fdeb1618d59a48948403e50c883eb5d36`

See more details on using hashes here.

mini-faiss 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mini-faiss: A Minimal Vector Similarity Search Library

Installation

From Source

Requirements

Quick Start

API Reference

IndexFlatL2

IndexFlatIP

Architecture

C++ Core

Memory Layout

Distance Computation

Performance

Use Cases

Limitations

Design Decisions

Why Heap-Based Top-K?

Why No Quantization?

Why Row-Major Layout?

Testing

Examples

Future Enhancements

Debugging

Enable Debug Output

Common Issues

Contributing

License

References

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes