The Mighty Tiny Vector Search Engine with Automatic Quantization and Hardware Acceleration
Project description
USearch
C++11 Single Header Vector Search
Compact, yet Powerful
- Single C++11 header implementation, easily extendible.
- 4B+ sized space efficient point-clouds with
uint40_t
. - Half-precision support with
maratyszcza/fp16
. - View from disk, without loading into RAM.
- Any metric, includes:
- Euclidean, Dot-product, Cosine,
- Jaccard, Hamming, Haversine.
- Hardware-accelerated
ashvardanian/simsimd
.
- Variable dimensionality vectors.
- Don't copy vectors if not needed.
- Bring your threads.
- Multiple vectors per label.
- Python bindings:
pip install usearch
. - JavaScript bindings:
npm install usearch
. - Rust bindings:
cargo add usearch
. - Java bindings:
cloud.unum:usearch
on GitHub. - GoLang bindings.
- Wolfram language bindings.
- For Linux: GCC, Clang.
- For MacOS: Apple Clang.
- For Windows.
- Multi-index lookups in Python.
- Thread-safe
reserve
. - Distributed construction.
Usage
There are two usage patters:
- Bare-bones with
usearch/usearch.hpp
, only available in C++. - Full-fat version with it's own threads, mutexes, type-punning, quantization, that is available both in C++ and is wrapped for higher-level bindings.
C++
To use in a C++ project simply copy the include/usearch/usearch.hpp
header into your project.
Alternatively fetch it with CMake:
FetchContent_Declare(usearch GIT_REPOSITORY https://github.com/unum-cloud/usearch.git)
FetchContent_MakeAvailable(usearch)
The simple usage example would require including the unum::usearch
namespace and choosing the right "distance" function.
That can be one of the following templates:
cos_gt<float>
for "Cosine" or "Angular" distance.ip_gt<float>
for "Inner Product" or "Dot Product" distance.l2_squared_gt<float>
for the squared "L2" or "Euclidean" distance.jaccard_gt<int>
for "Jaccard" distance between two ordered sets of unique elements.bit_hamming_gt<uint>
for "Hamming" distance, as the number of shared bits in hashes.pearson_correlation_gt<float>
for "Pearson" correlation between probability distributions.haversine_gt<float>
for "Haversine" or "Great Circle" distance between coordinates.
That list is easily extendible, and can include similarity measures for vectors that have a different number of elements/dimensions. The minimal example would be.
using namespace unum::usearch;
index_gt<cos_gt<float>> index;
float vec[3] = {0.1, 0.3, 0.2};
index.reserve(10);
index.add(/* label: */ 42, /* vector: */ {&vec, 3});
index.search(
/* query: */ {&vec, 3}, /* top */ 5 /* results */,
/* with callback: */ [](std::size_t label, float distance) { });
index.save("index.usearch"); // Serializing to disk
index.load("index.usearch"); // Reconstructing from disk
index.view("index.usearch"); // Memory-mapping from disk
The add
is thread-safe for concurrent index construction.
Python
Python bindings are implemented with pybind/pybind11
.
Assuming the presence of Global Interpreter Lock in Python, on large insertions we spawn threads in the C++ layer.
$ pip install usearch
import numpy as np
import usearch
index = usearch.Index(
dim=256, # Define the number of dimensions in input vectors
metric='cos', # Choose the "metric" or "distance", default = 'ip', optional
dtype='f16', # Quantize to 'f16' or 'i8q100' if needed, default = 'f32', optional
connectivity=16, # How frequent should the connections in the graph be, optional
expansion_add=128, # Control the recall of indexing, optional
expansion_search=64, # Control the quality of search, optional
)
n = 100
labels = np.array(range(n), dtype=np.longlong)
vectors = np.random.uniform(0, 0.3, (n, index.ndim)).astype(np.float32)
# You can avoid copying the data
# Handy when build 1B+ indexes of memory-mapped files
index.add(labels, vectors, copy=True)
assert len(index) == n
# You can search a batch at once
matches, distances, counts = index.search(vectors, 10)
Features
Bring your Threads
Performance
TODO
- JavaScript: Allow calling from "worker threads".
- Rust: Allow passing a custom thread ID.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl
.
File metadata
- Download URL: usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl
- Upload date:
- Size: 225.9 kB
- Tags: CPython 3.11, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 256f00e5e96b153077a9b1dc297819ac7e23bb94ce8eb6151e9c95d07a134c0d |
|
MD5 | 4bcd662cb91abcd17edcd6b7f95eeaeb |
|
BLAKE2b-256 | 794408b57b2ca5709bbd28418a00a0dc3eaec473883eaaa1b3427944508cf43a |
File details
Details for the file usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 122.2 kB
- Tags: CPython 3.10, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3254f5084613fead78e06c10fa8b73993556f997915ab63ba8af8b2a6a42d81f |
|
MD5 | fbf242cfdd6297dbe0fd583fcee0613b |
|
BLAKE2b-256 | 0c37bc4713c9e1744acc306f9c0c8f1229236d23d34949edb6f49b7121b69379 |
File details
Details for the file usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 116.5 kB
- Tags: CPython 3.10, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfd8c20bfae62bbf1be5f41a65e7faf9e34ff8c4de7b739e18cd5e8982e1584e |
|
MD5 | 10387422f55e0cd4dffe192c50cb0c4a |
|
BLAKE2b-256 | 86420351cc742d594296a01bfffc0c9aadc964c711351e685b80d8f820e5d0e2 |
File details
Details for the file usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 110.5 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7748f3dbd06d0554c0c98442a19d812851ed3d434b95b91bcde5f778167b10e9 |
|
MD5 | 6e76e5d696d525b345b2cac7cc032416 |
|
BLAKE2b-256 | 3652b7ff2cb0064676d909e5c971745b02777b20249c13678e1972ae550ff3c5 |
File details
Details for the file usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl
- Upload date:
- Size: 118.7 kB
- Tags: CPython 3.10, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a76d816cab661ae1316cd2b0474be262831da73eb74d7e164073e49e7a0249c |
|
MD5 | 7c77499081f03ca0563e0a7772b71763 |
|
BLAKE2b-256 | f080a296c32e0d8d80eeea63ebfc9b4357a7bcf080c3e15f9bd4d950e98b919f |
File details
Details for the file usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl
.
File metadata
- Download URL: usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl
- Upload date:
- Size: 225.9 kB
- Tags: CPython 3.10, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f82cf81d508549d4aa3f5c14c940b33566f92ca0118ec6f651c844b1f34e5a54 |
|
MD5 | e62cec2d14b37da4d2111b5f0ff4677c |
|
BLAKE2b-256 | 1627916982132af222a2e4871c9f12e10b2c3ea2c0943439df3a19ffecf84ed9 |