Skip to main content

A minimal, easy-to-understand FAISS-like library for vector similarity search with SIMD acceleration

Project description

mini-faiss: A Minimal Vector Similarity Search Library

A lightweight, high-performance library for similarity search in dense float32 embeddings. Built in C++ with SIMD acceleration and Python bindings via pybind11.

Key characteristics:

  • Fast: SIMD-accelerated distance computation (L2 and inner product)
  • Simple: ~1500 lines of clean C++ code, easy to understand and extend
  • Pythonic: NumPy-friendly API, installable via pip
  • Focused: Solves the core problem of similarity search without unnecessary complexity

Suitable for indexing and searching small to medium-scale datasets (up to a few million vectors on single machines).

Installation

From Source

git clone https://github.com/yourusername/mini-faiss.git
cd mini-faiss

# Install dependencies
pip install pybind11 scikit-build-core numpy

# Build and install
pip install .

Requirements

  • C++17 compiler (GCC, Clang, or MSVC)
  • Python 3.8+
  • NumPy 1.20+

Quick Start

import numpy as np
from mini_faiss import IndexFlatL2

# Create index for 768-dimensional vectors
d = 768
index = IndexFlatL2(d)

# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)

# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)

print(distances.shape)  # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape)    # (5, 10)

API Reference

IndexFlatL2

Flat index using squared Euclidean (L2) distance.

from mini_faiss import IndexFlatL2

# Create index
index = IndexFlatL2(d=768)

# Add vectors
index.add(xb)  # xb: (N, 768) float32 array

# Search
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of L2 distances
# indices: (nq, k) int64 array of vector indices

Parameters:

  • d (int): Vector dimension
  • xb (np.ndarray): Vectors to add, shape (N, d), dtype float32
  • xq (np.ndarray): Query vectors, shape (nq, d), dtype float32
  • k (int): Number of nearest neighbors to return

Methods:

  • add(xb): Add vectors to index
  • search(xq, k): Return top-k nearest neighbors
  • ntotal(): Get total number of indexed vectors
  • dimension(): Get vector dimension

IndexFlatIP

Flat index using inner product distance (useful for cosine similarity when vectors are normalized).

from mini_faiss import IndexFlatIP

# Create index
index = IndexFlatIP(d=768)

# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)

# Add and search
index.add(xb)
distances, indices = index.search(xq, k=10)
# distances: (nq, k) float32 array of inner products (higher = more similar)

Architecture

C++ Core

The library implements three main components:

  1. Data Structures (BaseIndex, IndexFlatL2, IndexFlatIP):

    • Flat, row-major storage of vectors in std::vector<float>
    • Metadata: dimension, vector count
    • Abstract interface for index implementations
  2. Distance Kernels:

    • l2_distance_batch(): Compute squared Euclidean distances
    • inner_product_batch(): Compute dot products
    • Vectorizable loops designed for SIMD auto-vectorization
  3. Top-K Selection:

    • select_topk_min(): Keep k smallest distances (for L2)
    • select_topk_max(): Keep k largest distances (for inner product)
    • Heap-based selection per query

Memory Layout

Vectors are stored in row-major layout:

data = [v_0[0], v_0[1], ..., v_0[d-1],
        v_1[0], v_1[1], ..., v_1[d-1],
        ...
        v_{N-1}[0], ..., v_{N-1}[d-1]]

Access: data[i*d + j] = j-th dimension of i-th vector.

Distance Computation

L2 (squared Euclidean):

||q - db||^2 = ||q||^2 - 2*q·db + ||db||^2

Precomputes database norms for efficiency.

Inner Product:

score = q · db

Simple dot product, maximized for similarity.

Performance

Benchmarks on typical hardware (Intel i7, single core):

Database Size Dimension mini-faiss NumPy Speedup
10,000 128 0.0012s 0.0085s ~7x
100,000 128 0.012s 0.086s ~7x
100,000 768 0.067s 0.48s ~7x

Performance depends on:

  • Hardware (CPU model, cache size, SIMD capabilities)
  • Compiler optimization flags
  • Vector dimension and dataset size

To build with maximum optimization:

# Automatic via scikit-build-core, but manually:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O3 -march=native" .
cmake --build . --config Release

Use Cases

  • Dense retrieval: Find similar documents/passages
  • Image search: Index embeddings from image models
  • Recommendation systems: Compute nearest neighbors for recommendations
  • Clustering: Find nearest neighbors for clustering algorithms
  • Prototyping: Quick iteration before moving to FAISS/specialized tools

Limitations

  • Fixed dimension: All vectors in an index must have the same dimension
  • Append-only: No deletion or update of indexed vectors
  • Brute force search: Linear scan of all vectors (no approximate methods)
  • Single machine: No distributed indexing
  • Memory: Limited by RAM (typical: ~1 million 768-dim vectors ≈ 3GB)

Design Decisions

Why Heap-Based Top-K?

For small k (typical case: k << N), a max-heap approach is simple and efficient:

  • Time: O(N log k) per query
  • Space: O(k) temporary storage
  • Alternative std::partial_sort has similar complexity

Why No Quantization?

V1 goal: simplicity and correctness. Quantization adds complexity:

  • Product quantization (PQ)
  • Scalar quantization Can be added in future versions without changing API.

Why Row-Major Layout?

  • Aligns with NumPy's default (C-contiguous)
  • Natural for dense vector operations
  • Cache-friendly for batch distance computation

Testing

Run the test suite:

pip install pytest
pytest tests/

Tests verify:

  • Correctness vs. NumPy reference implementations
  • Input validation and error handling
  • Output shapes and dtypes
  • Edge cases (empty index, k > N, etc.)

Examples

See examples/:

  • basic_usage.py: Core functionality demo
  • benchmark.py: Performance comparison with NumPy

Run examples:

python examples/basic_usage.py
python examples/benchmark.py

Future Enhancements

Possible extensions (outside v1 scope):

  1. Index Types:

    • Inverted file (IVF) for coarse-grained partitioning
    • Product quantization (PQ)
    • Hierarchical Navigable Small Worlds (HNSW)
  2. Operations:

    • Batch addition with progress tracking
    • Vector deletion / index rebuild
    • Index serialization (save/load)
  3. Features:

    • GPU acceleration (CUDA, Metal)
    • Explicit vector IDs (not just implicit 0..N-1)
    • Custom distance functions
  4. Optimization:

    • Cache-oblivious algorithms
    • SIMD assembly kernels
    • Multi-threaded search

Debugging

Enable Debug Output

Compile with debug symbols:

cmake -DCMAKE_BUILD_TYPE=Debug .

Common Issues

ImportError: cannot import name 'IndexFlatL2'

  • Ensure library built correctly: pip install -v .
  • Check Python version matches build: python --version

Dimension mismatch error

  • Verify query/database vector shapes match: assert xq.shape[1] == index.dimension()

Memory error with large datasets

  • Check available RAM
  • Reduce dataset size or use smaller dimension
  • Consider adding index types with compression in future

Contributing

Contributions welcome! Areas:

  • Performance improvements
  • Additional index types
  • GPU acceleration
  • Better documentation
  • Bug reports and fixes

License

MIT License. See LICENSE file.

References

  • FAISS - Large-scale similarity search
  • pybind11 - C++/Python bindings
  • NumPy - Numerical computing in Python

Citation

If you use mini-faiss in research, cite as:

@software{mini_faiss_2024,
  title={mini-faiss: A Minimal Vector Similarity Search Library},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/mini-faiss}
}

Last updated: 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl (77.8 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

File details

Details for the file mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for mini_faiss-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 278dbf69688d7857bc16ff7879b166f14a6b8fc0c285b3d033b4bd65d453b28d
MD5 0576a057c2a904d652ecfb0eb7bb977b
BLAKE2b-256 1264e59ca1780641bea3092e10a3423fdeb1618d59a48948403e50c883eb5d36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page