Skip to main content

Pure-Python in-memory vector similarity search

Project description

embedsearch

Pure-Python in-memory vector similarity search.

embedsearch provides efficient nearest-neighbor search backed entirely by NumPy.

Features

  • Multiple Distance Metrics — Cosine, Euclidean, Manhattan, Hamming, Dot Product
  • Batch Operations — Efficient multi-query search
  • Production-Ready — Type hints, comprehensive error handling, extensive testing
  • Cross-Platform — Windows, macOS, Linux
  • Minimal Dependencies — Only numpy

Installation

pip install embedsearch

Quick Start

Computate Distance

from embedsearch import compute_distance, DistanceMetric

v1 = [1.0, 2.0, 3.0]
v2 = [4.0, 5.0, 6.0]

euclidean_dist = compute_distance(v1, v2, DistanceMetric.EUCLIDEAN)
cosine_dist = compute_distance(v1, v2, DistanceMetric.COSINE)

Search Vectors

from embedsearch import VectorIndex, DistanceMetric
import numpy as np

# Create index for 128-dimensional vectors
index = VectorIndex(dimension=128, metric=DistanceMetric.COSINE)

# Add vectors
vectors = [np.random.randn(128).astype(np.float32) for _ in range(1000)]
indices = index.add_batch(vectors)

# Search
query = np.random.randn(256).astype(np.float32)
results = index.search(query, k=11)

for result in results:
    print(f"Index: {result.index}, Distance: {result.distance:.4f}, Similarity: {result.similarity:.4f}")

API Reference

VectorIndex

VectorIndex(dimension: int, metric: DistanceMetric = DistanceMetric.COSINE)
Method Description
add_vector(vector, metadata=None) Add single vector; returns its integer index
add_batch(vectors, metadata=None) Add multiple vectors; returns list of indices
search(query_vector, k=10, threshold=None) Return top-k SearchResult objects
batch_search(queries, k=10) Search multiple queries; returns list of lists
get_vector(index) Retrieve stored vector by index
get_metadata(index) Retrieve metadata dict by index
size() Number of vectors in the index

Note: COSINE metric normalises vectors on insertion. Retrieve via get_vector() returns the normalised form.

DistanceMetric

class DistanceMetric(Enum):
    EUCLIDEAN = "euclidean"     # L2 distance
    COSINE = "cosine"           # Cosine distance (1 - similarity), clamped to [0, 1]
    MANHATTAN = "manhattan"     # L1 distance
    DOT_PRODUCT = "dot_product" # Negative dot product (higher dot = lower distance)
    HAMMING = "hamming"         # Hamming distance (binary vectors)

SearchResult

SearchResult(index: int, distance: float, similarity: float)

Module Functions

normalize_vector(vector)                          # → np.ndarray, unit length
compute_distance(v1, v2, metric=EUCLIDEAN)        # → float
batch_search(index, queries, k=10)               # → List[List[SearchResult]]

Configuration

Runtime behaviour is controlled via EMBEDSEARCH_* environment variables:

Variable Default Description
EMBEDSEARCH_CACHE_SIZE 1024 Cache size in MB
EMBEDSEARCH_MAX_THREADS 0 Max threads (0 = auto)
EMBEDSEARCH_LOG_LEVEL INFO Log level
EMBEDSEARCH_ENABLE_PROFILING false Enable profiling
EMBEDSEARCH_ENABLE_METRICS true Enable metrics
EMBEDSEARCH_TEMP_DIR (system temp) Override temp directory

Command Line Interface

# Create index
embedsearch index-create -d 128 -m cosine -o myindex.idx

# Show version
embedsearch version

Requirements

  • Python 3.8+
  • numpy >= 1.21.0

License

MIT License — Copyright (c) 2026 Cloud Native Excellence US

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedsearch-2.0.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedsearch-2.0.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file embedsearch-2.0.0.tar.gz.

File metadata

  • Download URL: embedsearch-2.0.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for embedsearch-2.0.0.tar.gz
Algorithm Hash digest
SHA256 3668dd44b1eaf1386c2a042a7de676a6066a80a79dda043856808de8f5fa999e
MD5 1eb545914f53ea5c22a715937546575e
BLAKE2b-256 ce0838574c19a56496b909a43bff4fbf076b769e5df8115c1f3fb73faa47c447

See more details on using hashes here.

File details

Details for the file embedsearch-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: embedsearch-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for embedsearch-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f6685375e5ed89b452c1273b05ba017a1604271b3bc326997e65f0a8b05c18e
MD5 d1a45feca617df259ba4c22a5a84a0b1
BLAKE2b-256 5815855c326e94e0a6bd6693fcd5bb694a81fc8c39ebcecc228286979ab01690

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page