Simple Similarity Service
Project description
simsity
Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!
This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around encoding strategies as well as nearest neighbor approaches. Typical usecases include early stage bulk labelling and duplication discovery.
Install
You can install simsity via pip.
python -m pip install simsity
It's usually recommended that you also install embetter.
Quickstart
This is the basic setup for this package.
import pandas as pd
from embetter.text import SentenceEncoder
from simsity.datasets import fetch_recipes
from simsity.service import Service
from simsity.indexer import AnnoyIndexer
# Fetch data
df_recipes = fetch_recipes()
recipes = df_recipes['text']
# Create an indexer and encoder
indexer = AnnoyIndexer()
encoder = SentenceEncoder()
# The service combines the two into a single object.
service = Service(indexer=indexer, encoder=encoder)
# We can now build the service using this data.
service.index(recipes)
# And use it
idx, dists = service.query("meat", n_neighbors=10)
res = (pd.DataFrame({"recipe": recipes})
.iloc[idx]
.assign(dists=dists)
.to_markdown(index=False)
)
# Show results
print(res)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simsity-0.4.0.tar.gz
(8.1 kB
view hashes)
Built Distribution
Close
Hashes for simsity-0.4.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43e8424b84e32f85e774c860af323513778a663f08af0d17259613204aa1fe76 |
|
MD5 | 66253325e989709a8323ffd0b2875617 |
|
BLAKE2b-256 | 49f3799b508d7bcf0561059c74f1a65ec1e9bb4e89301673f91c0e27c538f5c2 |