The Mighty Tiny Vector Search Engine with Automatic Quantization and Hardware Acceleration
Project description
USearch
C++11 Single Header Vector Search
Compact, yet Powerful
- Single C++11 header implementation, easily extendible.
- 4B+ sized space efficient point-clouds with
uint40_t
. - Half-precision support with
maratyszcza/fp16
. - View from disk, without loading into RAM.
- Any metric, includes:
- Euclidean, Dot-product, Cosine,
- Jaccard, Hamming, Haversine.
- Hardware-accelerated
ashvardanian/simsimd
.
- Variable dimensionality vectors.
- Don't copy vectors if not needed.
- Bring your threads.
- Multiple vectors per label.
- Python bindings:
pip install usearch
. - JavaScript bindings:
npm install usearch
. - Rust bindings:
cargo add usearch
. - Java bindings:
cloud.unum:usearch
on GitHub. - GoLang bindings.
- Wolfram language bindings.
- For Linux: GCC, Clang.
- For MacOS: Apple Clang.
- For Windows.
- Multi-index lookups in Python.
- Thread-safe
reserve
. - Distributed construction.
Usage
There are two usage patters:
- Bare-bones with
usearch/usearch.hpp
, only available in C++. - Full-fat version with it's own threads, mutexes, type-punning, quantization, that is available both in C++ and is wrapped for higher-level bindings.
C++
To use in a C++ project simply copy the include/usearch/usearch.hpp
header into your project.
Alternatively fetch it with CMake:
FetchContent_Declare(usearch GIT_REPOSITORY https://github.com/unum-cloud/usearch.git)
FetchContent_MakeAvailable(usearch)
The simple usage example would require including the unum::usearch
namespace and choosing the right "distance" function.
That can be one of the following templates:
cos_gt<float>
for "Cosine" or "Angular" distance.ip_gt<float>
for "Inner Product" or "Dot Product" distance.l2_squared_gt<float>
for the squared "L2" or "Euclidean" distance.jaccard_gt<int>
for "Jaccard" distance between two ordered sets of unique elements.bit_hamming_gt<uint>
for "Hamming" distance, as the number of shared bits in hashes.pearson_correlation_gt<float>
for "Pearson" correlation between probability distributions.haversine_gt<float>
for "Haversine" or "Great Circle" distance between coordinates.
That list is easily extendible, and can include similarity measures for vectors that have a different number of elements/dimensions. The minimal example would be.
using namespace unum::usearch;
index_gt<cos_gt<float>> index;
float vec[3] = {0.1, 0.3, 0.2};
index.reserve(10);
index.add(/* label: */ 42, /* vector: */ {&vec, 3});
index.search(
/* query: */ {&vec, 3}, /* top */ 5 /* results */,
/* with callback: */ [](std::size_t label, float distance) { });
index.save("index.usearch"); // Serializing to disk
index.load("index.usearch"); // Reconstructing from disk
index.view("index.usearch"); // Memory-mapping from disk
The add
is thread-safe for concurrent index construction.
Python
Python bindings are implemented with pybind/pybind11
.
Assuming the presence of Global Interpreter Lock in Python, on large insertions we spawn threads in the C++ layer.
$ pip install usearch
import numpy as np
import usearch
index = usearch.Index(
dim=256, # Define the number of dimensions in input vectors
metric='cos', # Choose the "metric" or "distance", default = 'ip', optional
dtype='f16', # Quantize to 'f16' or 'i8q100' if needed, default = 'f32', optional
connectivity=16, # How frequent should the connections in the graph be, optional
expansion_add=128, # Control the recall of indexing, optional
expansion_search=64, # Control the quality of search, optional
)
n = 100
labels = np.array(range(n), dtype=np.longlong)
vectors = np.random.uniform(0, 0.3, (n, index.ndim)).astype(np.float32)
# You can avoid copying the data
# Handy when build 1B+ indexes of memory-mapped files
index.add(labels, vectors, copy=True)
assert len(index) == n
# You can search a batch at once
matches, distances, counts = index.search(vectors, 10)
Features
Bring your Threads
Performance
TODO
- JavaScript: Allow calling from "worker threads".
- Rust: Allow passing a custom thread ID.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for usearch-0.1.8-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 256f00e5e96b153077a9b1dc297819ac7e23bb94ce8eb6151e9c95d07a134c0d |
|
MD5 | 4bcd662cb91abcd17edcd6b7f95eeaeb |
|
BLAKE2b-256 | 794408b57b2ca5709bbd28418a00a0dc3eaec473883eaaa1b3427944508cf43a |
Hashes for usearch-0.1.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3254f5084613fead78e06c10fa8b73993556f997915ab63ba8af8b2a6a42d81f |
|
MD5 | fbf242cfdd6297dbe0fd583fcee0613b |
|
BLAKE2b-256 | 0c37bc4713c9e1744acc306f9c0c8f1229236d23d34949edb6f49b7121b69379 |
Hashes for usearch-0.1.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfd8c20bfae62bbf1be5f41a65e7faf9e34ff8c4de7b739e18cd5e8982e1584e |
|
MD5 | 10387422f55e0cd4dffe192c50cb0c4a |
|
BLAKE2b-256 | 86420351cc742d594296a01bfffc0c9aadc964c711351e685b80d8f820e5d0e2 |
Hashes for usearch-0.1.8-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7748f3dbd06d0554c0c98442a19d812851ed3d434b95b91bcde5f778167b10e9 |
|
MD5 | 6e76e5d696d525b345b2cac7cc032416 |
|
BLAKE2b-256 | 3652b7ff2cb0064676d909e5c971745b02777b20249c13678e1972ae550ff3c5 |
Hashes for usearch-0.1.8-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a76d816cab661ae1316cd2b0474be262831da73eb74d7e164073e49e7a0249c |
|
MD5 | 7c77499081f03ca0563e0a7772b71763 |
|
BLAKE2b-256 | f080a296c32e0d8d80eeea63ebfc9b4357a7bcf080c3e15f9bd4d950e98b919f |
Hashes for usearch-0.1.8-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f82cf81d508549d4aa3f5c14c940b33566f92ca0118ec6f651c844b1f34e5a54 |
|
MD5 | e62cec2d14b37da4d2111b5f0ff4677c |
|
BLAKE2b-256 | 1627916982132af222a2e4871c9f12e10b2c3ea2c0943439df3a19ffecf84ed9 |