Skip to main content

PLATO semantic similarity — embedding-based tile comparison and deduplication

Project description

🔬 Plato Semantic Sim

Embedding-based tile comparison and deduplication for PLATO rooms

Computes similarity between knowledge tiles using cosine similarity, Jaccard index, and edit distance. Finds duplicates via content hashing (exact) and vector comparison (semantic).

Install

pip install plato-semantic-sim

Quick Start

from plato_semantic_sim import SimilarityEngine, DedupEngine

engine = SimilarityEngine()
scores = engine.find_similar([0.1, 0.2, 0.3], {"a": [0.1, 0.2, 0.3], "b": [0.9, 0.8, 0.7]})
print(scores)  # [("a", 1.0)]

dedup = DedupEngine(similarity_threshold=0.85)
result = dedup.add("tile-1", "knowledge content", [0.1, 0.2, 0.3])
print(result)  # {"status": "unique"}

API

Class Purpose
SimilarityEngine Compare vectors, find similar, pairwise matrix
CosineSimilarity Dot product / magnitude
JaccardSimilarity Token overlap ratio
DedupEngine Exact + semantic dedup in one pass

Part of Cocapn · Agent Infrastructure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plato_semantic_sim-0.1.1.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plato_semantic_sim-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file plato_semantic_sim-0.1.1.tar.gz.

File metadata

  • Download URL: plato_semantic_sim-0.1.1.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for plato_semantic_sim-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b73f080ae368598b96579ae06ba596c1bc918372919c20c5707219971fbd8694
MD5 52e5b6e0e8165a921cd5b4f8b1c5fd67
BLAKE2b-256 3f83fd47be20b310a03647a2fd6fd18d806936ff8c25be566b2983a999b32aba

See more details on using hashes here.

File details

Details for the file plato_semantic_sim-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for plato_semantic_sim-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f7383eab0f58f0bffddbfd37de0d17cc26f8e0192b46d2ef72f012fc7113134
MD5 364c7bf0e2750990470426930fe1904f
BLAKE2b-256 a128b0740219ad930ceb9b9d86400a7885b9585447d8012d7714d03909f1a347

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page