Skip to main content

CSP-1 fingerprint encoder — compress any conversation into a portable 200KB .fp file

Project description

convoseed-agent

Compress any conversation into a portable 200KB .fp fingerprint file.

ConvoSeed implements the CSP-1 protocol — a method for encoding the style of a conversation (not the content) using SBERT embeddings, PCA compression, and Hyperdimensional Computing. The result is a fixed-size file you own and can load into any AI session.

pip install convoseed-agent

5-minute demo

Step 1 — install

pip install convoseed-agent
pip install sentence-transformers scikit-learn numpy

Step 2 — encode a conversation

Your conversation must be a JSON file in this format:

[
    {"role": "user", "content": "I've been thinking about memory..."},
    {"role": "assistant", "content": "Memory is deeply selective..."},
    ...
]

Then run:

convoseed-encode --input my_conversation.json --output identity.fp

Or in Python:

import json
from convoseed_agent import encode_conversation

with open("my_conversation.json") as f:
    messages = json.load(f)

encode_conversation(messages, "identity.fp")
# → identity.fp  (~200KB, fixed size regardless of conversation length)

Step 3 — identify a speaker from a new message

from convoseed_agent import identify, load_fp
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

winner, scores = identify(
    query_text="I wonder if that cognitive style could be captured somehow",
    fp_paths=["identity.fp", "other_person.fp"],
    model=model
)

print(f"Best match: {winner}")
for path, score in sorted(scores.items(), key=lambda x: -x[1]):
    print(f"  {score:.4f}  {path}")

What it does

Messages → SBERT embed → PCA compress → HDC bind → .fp file
  1. Embed — Sentence-BERT encodes each message into a 384-dim vector
  2. Compress — PCA extracts the style centroid (4 components = full accuracy)
  3. Bind — Hyperdimensional Computing (10,000-dim) weaves temporal sequence into one vector
  4. Save — Written to a portable JSON-based .fp file (~200KB)

Key result from the research paper: 4 PCA components capture full speaker identification accuracy, meaning conversational style is genuinely low-dimensional. You can represent how someone thinks with 4 numbers.


Research results

Validated on a real 524-message researcher-AI conversation:

Model Avg Similarity Peak Msgs > 0.7
GPT-2 (124M) 0.464 1.000 1
Gemma3:1b 0.466 0.707 1
Gemma3:12b 0.523 0.757 4

Speaker identification: 52% accuracy on 10 candidates (vs 10% random baseline), p < 10⁻¹⁰⁰.


Optional: generation (requires torch)

pip install convoseed-agent[decode]
from convoseed_agent import generate_with_prefix, load_fp

fp = load_fp("identity.fp")
output = generate_with_prefix("Tell me about your weekend", fp, model_name="gpt2")
print(output)

Status

Early research. Proof-of-concept validated on real data. Open for collaboration.

  • CSP-1 protocol specification
  • Encoder / decoder / identifier
  • Speaker identification experiment (1,000 trials)
  • Multi-model validation
  • Cross-model mapping (open research problem)
  • Public fingerprint registry

Links

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convoseed_agent-2.0.0.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convoseed_agent-2.0.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file convoseed_agent-2.0.0.tar.gz.

File metadata

  • Download URL: convoseed_agent-2.0.0.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for convoseed_agent-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b76f2696fa5490b15961d2ecd3f930ffa6c7ef4ccd7ac34a787ef3d2ca134070
MD5 515291b99fbccd09c87a83003149d33a
BLAKE2b-256 6fb7c23ff58dfcd4777c8d23e1100a09e06eb537bf98a83fa5cee702c0b6e27a

See more details on using hashes here.

File details

Details for the file convoseed_agent-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for convoseed_agent-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41b0987e774ddb89efa0c6422d0a05844981ee3fb71db9062eb036fcdbc8fc95
MD5 2bc62b47edabcea07a5072381779031e
BLAKE2b-256 8bd7523eba987985e7b778d74a9edd8b05a3711bba5ddbe604b1715d799ab56e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page