Skip to main content

Simple Similarity Service

Project description

simsity

simsity: it's all about the neighborhood

Simsity is a Super Simple Similarities Service. This repository contains simple tools to help in similarity retreival scenarios. Typical usecases include early stage bulk labelling and duplication discovery.

Warning

Alpha software. Expect things to break. Do not use in production.

Example

This is the basic setup for this package.

from simsity.service import Service
from simsity.indexer import PyNNDescentIndexer
from sklearn.feature_extraction.text import CountVectorizer


# The Indexer handles the nearest neighbor search
# The Encoder handles the encoding of the datapoints
service = Service(
    indexer=PyNNDescentIndexer(metric="euclidean"),
    encoder=CountVectorizer()
)

# Index the datapoints
service.train_from_csv("clinc-data.csv", text_col="text")

# Query the datapoints
service.query("give me directions", n_neighbors=100)

# Save the entire system
service.save("/tmp/simple-model")

# You can also load the model now.
Service.load("/tmp/simple-model")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simsity-0.0.1.tar.gz (4.3 kB view hashes)

Uploaded Source

Built Distribution

simsity-0.0.1-py2.py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page