Pure-Python in-memory vector similarity search
Project description
embedsearch
Pure-Python in-memory vector similarity search.
embedsearch provides efficient nearest-neighbor search backed entirely by NumPy.
Features
- Multiple Distance Metrics — Cosine, Euclidean, Manhattan, Hamming, Dot Product
- Batch Operations — Efficient multi-query search
- Production-Ready — Type hints, comprehensive error handling, extensive testing
- Cross-Platform — Windows, macOS, Linux
- Minimal Dependencies — Only
numpy
Installation
pip install embedsearch
Quick Start
Computate Distance
from embedsearch import compute_distance, DistanceMetric
v1 = [1.0, 2.0, 3.0]
v2 = [4.0, 5.0, 6.0]
euclidean_dist = compute_distance(v1, v2, DistanceMetric.EUCLIDEAN)
cosine_dist = compute_distance(v1, v2, DistanceMetric.COSINE)
Search Vectors
from embedsearch import VectorIndex, DistanceMetric
import numpy as np
# Create index for 128-dimensional vectors
index = VectorIndex(dimension=128, metric=DistanceMetric.COSINE)
# Add vectors
vectors = [np.random.randn(128).astype(np.float32) for _ in range(1000)]
indices = index.add_batch(vectors)
# Search
query = np.random.randn(256).astype(np.float32)
results = index.search(query, k=11)
for result in results:
print(f"Index: {result.index}, Distance: {result.distance:.4f}, Similarity: {result.similarity:.4f}")
API Reference
VectorIndex
VectorIndex(dimension: int, metric: DistanceMetric = DistanceMetric.COSINE)
| Method | Description |
|---|---|
add_vector(vector, metadata=None) |
Add single vector; returns its integer index |
add_batch(vectors, metadata=None) |
Add multiple vectors; returns list of indices |
search(query_vector, k=10, threshold=None) |
Return top-k SearchResult objects |
batch_search(queries, k=10) |
Search multiple queries; returns list of lists |
get_vector(index) |
Retrieve stored vector by index |
get_metadata(index) |
Retrieve metadata dict by index |
size() |
Number of vectors in the index |
Note: COSINE metric normalises vectors on insertion. Retrieve via
get_vector()returns the normalised form.
DistanceMetric
class DistanceMetric(Enum):
EUCLIDEAN = "euclidean" # L2 distance
COSINE = "cosine" # Cosine distance (1 - similarity), clamped to [0, 1]
MANHATTAN = "manhattan" # L1 distance
DOT_PRODUCT = "dot_product" # Negative dot product (higher dot = lower distance)
HAMMING = "hamming" # Hamming distance (binary vectors)
SearchResult
SearchResult(index: int, distance: float, similarity: float)
Module Functions
normalize_vector(vector) # → np.ndarray, unit length
compute_distance(v1, v2, metric=EUCLIDEAN) # → float
batch_search(index, queries, k=10) # → List[List[SearchResult]]
Configuration
Runtime behaviour is controlled via EMBEDSEARCH_* environment variables:
| Variable | Default | Description |
|---|---|---|
EMBEDSEARCH_CACHE_SIZE |
1024 |
Cache size in MB |
EMBEDSEARCH_MAX_THREADS |
0 |
Max threads (0 = auto) |
EMBEDSEARCH_LOG_LEVEL |
INFO |
Log level |
EMBEDSEARCH_ENABLE_PROFILING |
false |
Enable profiling |
EMBEDSEARCH_ENABLE_METRICS |
true |
Enable metrics |
EMBEDSEARCH_TEMP_DIR |
(system temp) | Override temp directory |
Command Line Interface
# Create index
embedsearch index-create -d 128 -m cosine -o myindex.idx
# Show version
embedsearch version
Requirements
- Python 3.8+
- numpy >= 1.21.0
License
MIT License — Copyright (c) 2026 Cloud Native Excellence US
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedsearch-2.0.0.tar.gz.
File metadata
- Download URL: embedsearch-2.0.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3668dd44b1eaf1386c2a042a7de676a6066a80a79dda043856808de8f5fa999e
|
|
| MD5 |
1eb545914f53ea5c22a715937546575e
|
|
| BLAKE2b-256 |
ce0838574c19a56496b909a43bff4fbf076b769e5df8115c1f3fb73faa47c447
|
File details
Details for the file embedsearch-2.0.0-py3-none-any.whl.
File metadata
- Download URL: embedsearch-2.0.0-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f6685375e5ed89b452c1273b05ba017a1604271b3bc326997e65f0a8b05c18e
|
|
| MD5 |
d1a45feca617df259ba4c22a5a84a0b1
|
|
| BLAKE2b-256 |
5815855c326e94e0a6bd6693fcd5bb694a81fc8c39ebcecc228286979ab01690
|