Skip to main content

4-stage tile similarity — exact, keyword Jaccard, embedding cosine, structure

Project description

plato-tile-dedup

4-stage tile similarity — exact, keyword Jaccard, embedding cosine, structure.

Part of the PLATO framework — deterministic AI knowledge management through tile-based architecture.

Installation

pip install plato-tile-dedup

Usage

from plato_tile_dedup import TileDeduplicator

dedup = TileDeduplicator(threshold=0.85)

tiles = [
    {"id": "1", "content": "Pythagorean triples snap to exact coordinates"},
    {"id": "2", "content": "Pythagorean triples snap to exact coordinates"},  # duplicate
    {"id": "3", "content": "Constraint theory maps continuous vectors to discrete space"},
]

unique = dedup.dedup_batch(tiles)
print(len(unique))  # 2 (tile 2 removed as duplicate of tile 1)

4 Similarity Stages

Stage Weight Method
Exact 0.1 Content hash comparison
Keyword 0.3 Token-level Jaccard overlap
Embedding 0.5 Cosine similarity (requires embedding model)
Structure 0.1 Domain and question type matching

Zero external dependencies. Compatible with Python 3.8+.

GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plato_tile_dedup-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plato_tile_dedup-0.1.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file plato_tile_dedup-0.1.0.tar.gz.

File metadata

  • Download URL: plato_tile_dedup-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for plato_tile_dedup-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1d36c2bb134e4d3c19460123e4eed0c9005907c7236d85e6acc29c977987b59b
MD5 6d16584294a0d66f4e489fc9adede551
BLAKE2b-256 3fcf95dd07f0725e8addd5d748fe68a48ee6759f609354fc80d33f11d4d9512b

See more details on using hashes here.

File details

Details for the file plato_tile_dedup-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for plato_tile_dedup-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5b69b145881b6dcd46642f7dc0545a2dfb9b02b803efaa678906a9e5d1e6ed4
MD5 a6f0f3edaa44793fdd40bf50569400ae
BLAKE2b-256 02fa8a14674e06455ba1b957c82fc739a0c91e6f3e1d2035ecb00e1e8a7ad60e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page