Simple Similarity Service
Project description
simsity
simsity: it's all about the neighborhood
Simsity is a Super Simple Similarities Service. This repository contains simple tools to help in similarity retreival scenarios. Typical usecases include early stage bulk labelling and duplication discovery.
Warning
Alpha software. Expect things to break. Do not use in production.
Example
This is the basic setup for this package.
from simsity.service import Service
from simsity.indexer import PyNNDescentIndexer
from sklearn.feature_extraction.text import CountVectorizer
# The Indexer handles the nearest neighbor search
# The Encoder handles the encoding of the datapoints
service = Service(
indexer=PyNNDescentIndexer(metric="euclidean"),
encoder=CountVectorizer()
)
# Index the datapoints
service.train_from_csv("clinc-data.csv", text_col="text")
# Query the datapoints
service.query("give me directions", n_neighbors=100)
# Save the entire system
service.save("/tmp/simple-model")
# You can also load the model now.
Service.load("/tmp/simple-model")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simsity-0.0.1.tar.gz
(4.3 kB
view hashes)
Built Distribution
Close
Hashes for simsity-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ddc67b46a7b69302bd5d412fb69e02a772d2b346e1054990c5e498866474d8b |
|
MD5 | 4edb45d712b4fd6a9974069c9673c00c |
|
BLAKE2b-256 | 88ade346dbebbc884a0be58df4e8c45c9e2fbc935a1ef53a7f9b9e83ea4b998b |