Skip to main content

The Mighty Tiny Vector Search Engine with Automatic Quantization and Hardware Acceleration

Project description

USearch

C++11 Single Header Vector Search
Compact, yet Powerful


Discord     LinkedIn     Twitter     Blog     GitHub

  • Single C++11 header implementation, easily extendible.
  • 4B+ sized space efficient point-clouds with uint40_t.
  • Half-precision support with maratyszcza/fp16.
  • View from disk, without loading into RAM.
  • Any metric, includes:
    • Euclidean, Dot-product, Cosine,
    • Jaccard, Hamming, Haversine.
    • Hardware-accelerated ashvardanian/simsimd.
  • Variable dimensionality vectors.
  • Don't copy vectors if not needed.
  • Bring your threads.
  • Multiple vectors per label.
  • Python bindings: pip install usearch.
  • JavaScript bindings: npm install usearch.
  • Rust bindings: cargo add usearch.
  • Java bindings: cloud.unum:usearch on GitHub.
  • GoLang bindings.
  • Wolfram language bindings.
  • For Linux: GCC, Clang.
  • For MacOS: Apple Clang.
  • For Windows.
  • Multi-index lookups in Python.
  • Thread-safe reserve.
  • Distributed construction.

Usage

There are two usage patters:

  • Bare-bones with usearch/usearch.hpp, only available in C++.
  • Full-fat version with it's own threads, mutexes, type-punning, quantization, that is available both in C++ and is wrapped for higher-level bindings.

C++

To use in a C++ project simply copy the include/usearch/usearch.hpp header into your project. Alternatively fetch it with CMake:

FetchContent_Declare(usearch GIT_REPOSITORY https://github.com/unum-cloud/usearch.git)
FetchContent_MakeAvailable(usearch)

The simple usage example would require including the unum::usearch namespace and choosing the right "distance" function. That can be one of the following templates:

  • cos_gt<float> for "Cosine" or "Angular" distance.
  • ip_gt<float> for "Inner Product" or "Dot Product" distance.
  • l2_squared_gt<float> for the squared "L2" or "Euclidean" distance.
  • jaccard_gt<int> for "Jaccard" distance between two ordered sets of unique elements.
  • bit_hamming_gt<uint> for "Hamming" distance, as the number of shared bits in hashes.
  • pearson_correlation_gt<float> for "Pearson" correlation between probability distributions.
  • haversine_gt<float> for "Haversine" or "Great Circle" distance between coordinates.

That list is easily extendible, and can include similarity measures for vectors that have a different number of elements/dimensions. The minimal example would be.

using namespace unum::usearch;

index_gt<cos_gt<float>> index;
float vec[3] = {0.1, 0.3, 0.2};

index.reserve(10);
index.add(/* label: */ 42, /* vector: */ {&vec, 3});
index.search(
  /* query: */ {&vec, 3}, /* top */ 5 /* results */,
  /* with callback: */ [](std::size_t label, float distance) { });

index.save("index.usearch"); // Serializing to disk
index.load("index.usearch"); // Reconstructing from disk
index.view("index.usearch"); // Memory-mapping from disk

The add is thread-safe for concurrent index construction.

Python

Python bindings are implemented with pybind/pybind11. Assuming the presence of Global Interpreter Lock in Python, on large insertions we spawn threads in the C++ layer.

$ pip install usearch

import numpy as np
import usearch

index = usearch.Index(
    dim=256, # Define the number of dimensions in input vectors
    metric='cos', # Choose the "metric" or "distance", default = 'ip', optional
    dtype='f16', # Quantize to 'f16' or 'i8q100' if needed, default = 'f32', optional
    connectivity=16, # How frequent should the connections in the graph be, optional
    expansion_add=128, # Control the recall of indexing, optional
    expansion_search=64, # Control the quality of search, optional
)

n = 100
labels = np.array(range(n), dtype=np.longlong)
vectors = np.random.uniform(0, 0.3, (n, index.ndim)).astype(np.float32)

# You can avoid copying the data
# Handy when build 1B+ indexes of memory-mapped files
index.add(labels, vectors, copy=True)
assert len(index) == n

# You can search a batch at once
matches, distances, counts = index.search(vectors, 10)

Features

Bring your Threads

Performance

TODO

  • JavaScript: Allow calling from "worker threads".
  • Rust: Allow passing a custom thread ID.

Project details


Release history Release notifications | RSS feed

This version

0.1.8

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl (225.9 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (122.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.24+ x86-64 manylinux: glibc 2.28+ x86-64

usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (116.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.24+ ARM64 manylinux: glibc 2.28+ ARM64

usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl (110.5 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl (118.7 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl (225.9 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 256f00e5e96b153077a9b1dc297819ac7e23bb94ce8eb6151e9c95d07a134c0d
MD5 4bcd662cb91abcd17edcd6b7f95eeaeb
BLAKE2b-256 794408b57b2ca5709bbd28418a00a0dc3eaec473883eaaa1b3427944508cf43a

See more details on using hashes here.

File details

Details for the file usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3254f5084613fead78e06c10fa8b73993556f997915ab63ba8af8b2a6a42d81f
MD5 fbf242cfdd6297dbe0fd583fcee0613b
BLAKE2b-256 0c37bc4713c9e1744acc306f9c0c8f1229236d23d34949edb6f49b7121b69379

See more details on using hashes here.

File details

Details for the file usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cfd8c20bfae62bbf1be5f41a65e7faf9e34ff8c4de7b739e18cd5e8982e1584e
MD5 10387422f55e0cd4dffe192c50cb0c4a
BLAKE2b-256 86420351cc742d594296a01bfffc0c9aadc964c711351e685b80d8f820e5d0e2

See more details on using hashes here.

File details

Details for the file usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7748f3dbd06d0554c0c98442a19d812851ed3d434b95b91bcde5f778167b10e9
MD5 6e76e5d696d525b345b2cac7cc032416
BLAKE2b-256 3652b7ff2cb0064676d909e5c971745b02777b20249c13678e1972ae550ff3c5

See more details on using hashes here.

File details

Details for the file usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1a76d816cab661ae1316cd2b0474be262831da73eb74d7e164073e49e7a0249c
MD5 7c77499081f03ca0563e0a7772b71763
BLAKE2b-256 f080a296c32e0d8d80eeea63ebfc9b4357a7bcf080c3e15f9bd4d950e98b919f

See more details on using hashes here.

File details

Details for the file usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 f82cf81d508549d4aa3f5c14c940b33566f92ca0118ec6f651c844b1f34e5a54
MD5 e62cec2d14b37da4d2111b5f0ff4677c
BLAKE2b-256 1627916982132af222a2e4871c9f12e10b2c3ea2c0943439df3a19ffecf84ed9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page