Blazing-fast similarity scores for strings, vectors, points, and sets.
Project description
simmetry
Blazing-fast similarity scores for strings, vectors, points, and sets — with a simple API.
Install
pip install simmetry
pip install "simmetry[fast]"
simmetry[fast] enables optional Numba acceleration for pairwise(..., metric="euclidean_sim") and pairwise(..., metric="manhattan_sim").
Quickstart
One function
from simmetry import similarity
similarity("kitten", "sitting", metric="levenshtein")
similarity([1,2,3], [1,2,4], metric="cosine")
similarity((41.1, 29.0), (41.2, 29.1), metric="haversine_km")
similarity({1,2,3}, {2,3,4}, metric="jaccard")
Pairwise matrices (fast for vectors)
import numpy as np
from simmetry import pairwise
X = np.random.randn(1000, 128)
S = pairwise(X, metric="cosine")
Top-k search
import numpy as np
from simmetry import topk
X = np.random.randn(5000, 64)
q = np.random.randn(64)
idx, scores = topk(q, X, k=10, metric="cosine")
Available metrics
from simmetry import available
available()
available("vector")
available("string")
available("point")
available("set")
Vectors
cosine,dot,euclidean_sim,manhattan_sim,pearson
Strings
levenshtein(normalized similarity)jaro_winklerngram_jaccard(character n-gram set Jaccard)token_jaccard(whitespace token set Jaccard)
Points / Geo
euclidean_2dhaversine_km
Sets
jaccard,dice,overlap
License
MIT
Batch string APIs
If you need many string-to-string similarities (e.g., deduping names), use:
from simmetry.strings import pairwise_strings, topk_strings
S = pairwise_strings(["item_one", "item_two"], ["item_one", "item_alt"], metric="jaro_winkler")
idx, scores = topk_strings("samplecorp", ["samplecorp", "examplefinance", "testgroup"], k=2, metric="levenshtein")
ANN top-k (optional, does NOT bloat core)
For very large vector corpora (100k+), exact topk() can be slow. ANN gives fast approximate results.
hnswlib (recommended)
pip install "simmetry[ann-hnsw]"
import numpy as np
from simmetry.ann import build_hnsw
X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)
index = build_hnsw(X, space="cosine")
labels, distances = index.query(X[0], k=10)
faiss
pip install "simmetry[ann-faiss]"
import numpy as np
from simmetry.ann import build_faiss
X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)
index = build_faiss(X, metric="ip")
labels, scores = index.query(X[0], k=10)
SimIndex (exact or ANN)
Exact search (no extras):
import numpy as np
from simmetry import SimIndex
X = np.random.randn(50_000, 128).astype("float32")
index = SimIndex(metric="cosine", backend="exact").add(X)
idx, scores = index.query(X[0], k=10)
ANN (optional):
pip install "simmetry[ann-hnsw]"
import numpy as np
from simmetry import SimIndex
X = np.random.randn(200_000, 128).astype("float32")
X /= np.linalg.norm(X, axis=1, keepdims=True)
index = SimIndex(metric="cosine", backend="hnsw").add(X)
labels, distances = index.query(X[0], k=10)
Auto similarity and composite records
Auto metric selection:
from simmetry import similarity
similarity("samplecorp", "sample corp")
similarity((41.0, 29.0), (41.1, 29.1))
similarity({1,2,3}, {2,3,4})
Composite similarity over dict fields:
a = {"name": "Entity One", "city": "CityAlpha", "loc": (41.0, 29.0)}
b = {"name": "Entity One Extended", "city": "CityAlpha", "loc": (41.01, 28.99)}
score = similarity(
a, b,
metric={"name": "jaro_winkler", "loc": "haversine_km"},
weights={"name": 0.7, "loc": 0.3},
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simmetry-1.0.1.tar.gz.
File metadata
- Download URL: simmetry-1.0.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1d9faffeb308665c2b9a0e2f1fc95693431f1a874a3c3976cc35fefa9ac42bf
|
|
| MD5 |
f5cc9a86884e6efda8ebab9982554a88
|
|
| BLAKE2b-256 |
9e7f067ae1449243dd209a87c5ac2c177a1f349134224fcc5527c7871363b46e
|
File details
Details for the file simmetry-1.0.1-py3-none-any.whl.
File metadata
- Download URL: simmetry-1.0.1-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14b3894fe8bb52f73cfa0e1777a937298731058c50e896c68966478b1297c195
|
|
| MD5 |
8f7dfaa4b13a5d858eea45450821c380
|
|
| BLAKE2b-256 |
a43ecaeae415f21ce3dfefb8cdc53c65413897faa4b44d76233c941364ced723
|