Simple Similarity Service
Project description
simsity
Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!
This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around hnswlib. Typical usecases include early stage bulk labelling and duplication discovery.
Install
You can install simsity via pip.
python -m pip install simsity
# Simsity provides two functions to create/load an index
from simsity import create_index, load_index
# It also has some dataset for demos
from simsity.datasets import fetch_recipes
# Let's use embetter for embeddings
from embetter.text import SentenceEncoder
# Here's a list of data we'll encode/index
df_recipes = fetch_recipes()
recipes = df_recipes["text"]
# Create the (scikit-learn compatible) encoder
encoder = SentenceEncoder()
# Make an index without a path
index = create_index(recipes, encoder)
texts, dists = index.query("pork")
You can also provide a path and then you'll be able to store/load everything.
# Make an index with a path
index = create_index(recipes, encoder, path="demo")
# Load an index from a path
loader_index = load_index(path="demo", encoder=encoder)
texts, dists = index.query("pork")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simsity-0.5.0.tar.gz
(4.7 kB
view hashes)
Built Distribution
Close
Hashes for simsity-0.5.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f946b92a887839cb5e6820ee38ae11184f131928dbdabaca4530aa5ce0427120 |
|
MD5 | 0118a491e0a586f51a356e9c01b4cfbe |
|
BLAKE2b-256 | 9f83870f181e36dd002694c0a54047c7fead20ac0c7a363b6b7ca8518c867f48 |