Skip to main content

Blazing-fast similarity scores for strings, vectors, points, and sets.

Project description

simmetry

Blazing-fast similarity scores for strings, vectors, points, and sets — with a simple API.

Install

pip install simmetry
pip install "simmetry[fast]"

simmetry[fast] enables optional Numba acceleration for pairwise(..., metric="euclidean_sim") and pairwise(..., metric="manhattan_sim").

Quickstart

One function

from simmetry import similarity

similarity("kitten", "sitting", metric="levenshtein")     
similarity([1,2,3], [1,2,4], metric="cosine")             
similarity((41.1, 29.0), (41.2, 29.1), metric="haversine_km")
similarity({1,2,3}, {2,3,4}, metric="jaccard")

Pairwise matrices (fast for vectors)

import numpy as np
from simmetry import pairwise

X = np.random.randn(1000, 128)
S = pairwise(X, metric="cosine")          

Top-k search

import numpy as np
from simmetry import topk

X = np.random.randn(5000, 64)
q = np.random.randn(64)
idx, scores = topk(q, X, k=10, metric="cosine")

Available metrics

from simmetry import available
available()             
available("vector")     
available("string")     
available("point")      
available("set")        

Vectors

  • cosine, dot, euclidean_sim, manhattan_sim, pearson

Strings

  • levenshtein (normalized similarity)
  • jaro_winkler
  • ngram_jaccard (character n-gram set Jaccard)
  • token_jaccard (whitespace token set Jaccard)

Points / Geo

  • euclidean_2d
  • haversine_km

Sets

  • jaccard, dice, overlap

License

MIT

Batch string APIs

If you need many string-to-string similarities (e.g., deduping names), use:

from simmetry.strings import pairwise_strings, topk_strings

S = pairwise_strings(["item_one", "item_two"], ["item_one", "item_alt"], metric="jaro_winkler")
idx, scores = topk_strings("samplecorp", ["samplecorp", "examplefinance", "testgroup"], k=2, metric="levenshtein")

ANN top-k (optional, does NOT bloat core)

For very large vector corpora (100k+), exact topk() can be slow. ANN gives fast approximate results.

hnswlib (recommended)

pip install "simmetry[ann-hnsw]"
import numpy as np
from simmetry.ann import build_hnsw

X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)  

index = build_hnsw(X, space="cosine")
labels, distances = index.query(X[0], k=10)

faiss

pip install "simmetry[ann-faiss]"
import numpy as np
from simmetry.ann import build_faiss

X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)  

index = build_faiss(X, metric="ip")
labels, scores = index.query(X[0], k=10)

SimIndex (exact or ANN)

Exact search (no extras):

import numpy as np
from simmetry import SimIndex

X = np.random.randn(50_000, 128).astype("float32")
index = SimIndex(metric="cosine", backend="exact").add(X)

idx, scores = index.query(X[0], k=10)

ANN (optional):

pip install "simmetry[ann-hnsw]"
import numpy as np
from simmetry import SimIndex

X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)

index = SimIndex(metric="cosine", backend="hnsw").add(X)
labels, distances = index.query(X[0], k=10)

Auto similarity and composite records

Auto metric selection:

from simmetry import similarity

similarity("samplecorp", "sample corp")
similarity((41.0, 29.0), (41.1, 29.1)) 
similarity({1,2,3}, {2,3,4})         

Composite similarity over dict fields:

a = {"name": "Entity One", "city": "CityAlpha", "loc": (41.0, 29.0)}
b = {"name": "Entity One Extended", "city": "CityAlpha", "loc": (41.01, 28.99)}

score = similarity(
    a, b,
    metric={"name": "jaro_winkler", "loc": "haversine_km"},
    weights={"name": 0.7, "loc": 0.3},
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simmetry-1.0.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simmetry-1.0.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file simmetry-1.0.1.tar.gz.

File metadata

  • Download URL: simmetry-1.0.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for simmetry-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c1d9faffeb308665c2b9a0e2f1fc95693431f1a874a3c3976cc35fefa9ac42bf
MD5 f5cc9a86884e6efda8ebab9982554a88
BLAKE2b-256 9e7f067ae1449243dd209a87c5ac2c177a1f349134224fcc5527c7871363b46e

See more details on using hashes here.

File details

Details for the file simmetry-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: simmetry-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for simmetry-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 14b3894fe8bb52f73cfa0e1777a937298731058c50e896c68966478b1297c195
MD5 8f7dfaa4b13a5d858eea45450821c380
BLAKE2b-256 a43ecaeae415f21ce3dfefb8cdc53c65413897faa4b44d76233c941364ced723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page