Simple Similarity Service

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!

This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around encoding strategies as well as nearest neighbor approaches. Typical usecases include early stage bulk labelling and duplication discovery.

Install

You can install simsity via pip.

python -m pip install simsity

It's usually recommended that you also install embetter.

Quickstart

This is the basic setup for this package.

import pandas as pd
from embetter.text import SentenceEncoder

from simsity.datasets import fetch_recipes
from simsity.service import Service
from simsity.indexer import AnnoyIndexer


# Fetch data
df_recipes = fetch_recipes()
recipes = df_recipes['text']

# Create an indexer and encoder
indexer = AnnoyIndexer()
encoder = SentenceEncoder()

# The service combines the two into a single object.
service = Service(indexer=indexer, encoder=encoder)

# We can now build the service using this data.
service.index(recipes)

# And use it
idx, dists = service.query("meat", n_neighbors=10)

res = (pd.DataFrame({"recipe": recipes})
    .iloc[idx]
    .assign(dists=dists)
    .to_markdown(index=False)
)

# Show results
print(res)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.5

Jun 24, 2023

0.5.4

Apr 15, 2023

0.5.3

Apr 11, 2023

0.5.2

Apr 4, 2023

0.5.1

Apr 1, 2023

0.5.0

Mar 3, 2023

0.4.1

Mar 28, 2023

0.4.0

Feb 25, 2023

This version

0.3.0

Feb 25, 2023

0.2.0

Jan 1, 2022

0.1.1

Nov 4, 2021

0.1.0

Oct 20, 2021

0.0.1

Oct 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

simsity-0.3.0-py3.10.egg (21.1 kB view hashes)

Uploaded Feb 25, 2023 Source

Hashes for simsity-0.3.0-py3.10.egg

Hashes for simsity-0.3.0-py3.10.egg
Algorithm	Hash digest
SHA256	`5dd25aaba3845c28ef60e338ba3f3a77f979bc2eb863ab0c8a4647093e5456ae`
MD5	`7470c04e6bf88c37db929e44b187671b`
BLAKE2b-256	`d2073654198ebc92bbae8d4e239a9201543d1f6f77a322706567add9d817cbac`