Simple Similarity Service

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!

This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around hnswlib. Typical usecases include early stage bulk labelling and duplication discovery.

Install

You can install simsity via pip.

python -m pip install simsity

# Simsity provides two functions to create/load an index
from simsity import create_index, load_index
# It also has some dataset for demos 
from simsity.datasets import fetch_recipes
# Let's use embetter for embeddings 
from embetter.text import SentenceEncoder

# Here's a list of data we'll encode/index
df_recipes = fetch_recipes()
recipes = df_recipes["text"]

# Create the (scikit-learn compatible) encoder
encoder = SentenceEncoder()

# Make an index without a path
index = create_index(recipes, encoder)
texts, dists = index.query("pork")

You can also provide a path and then you'll be able to store/load everything.

# Make an index with a path
index = create_index(recipes, encoder, path="demo")

# Load an index from a path
loader_index = load_index(path="demo", encoder=encoder)
texts, dists = index.query("pork")

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.5

Jun 24, 2023

0.5.4

Apr 15, 2023

0.5.3

Apr 11, 2023

0.5.2

Apr 4, 2023

0.5.1

Apr 1, 2023

This version

0.5.0

Mar 3, 2023

0.4.1

Mar 28, 2023

0.4.0

Feb 25, 2023

0.3.0

Feb 25, 2023

0.2.0

Jan 1, 2022

0.1.1

Nov 4, 2021

0.1.0

Oct 20, 2021

0.0.1

Oct 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simsity-0.5.0.tar.gz (4.7 kB view hashes)

Uploaded Mar 3, 2023 Source

Built Distribution

simsity-0.5.0-py2.py3-none-any.whl (11.7 kB view hashes)

Uploaded Mar 3, 2023 Python 2 Python 3

Hashes for simsity-0.5.0.tar.gz

Hashes for simsity-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`3744ee1269682c63d108bffc1ea725d512338fc2184ab1baca62f3a08ab37b04`
MD5	`98388e1e8f8b0f9cac927ad9807b468e`
BLAKE2b-256	`f8604a8db966ee794459865b36959365e90c808d79e15ee38a492c63ecd41bc3`

Hashes for simsity-0.5.0-py2.py3-none-any.whl

Hashes for simsity-0.5.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`f946b92a887839cb5e6820ee38ae11184f131928dbdabaca4530aa5ce0427120`
MD5	`0118a491e0a586f51a356e9c01b4cfbe`
BLAKE2b-256	`9f83870f181e36dd002694c0a54047c7fead20ac0c7a363b6b7ca8518c867f48`