CSP-1 fingerprint encoder — compress any conversation into a portable 200KB .fp file
Project description
convoseed-agent
Compress any conversation into a portable 200KB .fp fingerprint file.
ConvoSeed implements the CSP-1 protocol — a method for encoding the style of a conversation (not the content) using SBERT embeddings, PCA compression, and Hyperdimensional Computing. The result is a fixed-size file you own and can load into any AI session.
pip install convoseed-agent
5-minute demo
Step 1 — install
pip install convoseed-agent
pip install sentence-transformers scikit-learn numpy
Step 2 — encode a conversation
Your conversation must be a JSON file in this format:
[
{"role": "user", "content": "I've been thinking about memory..."},
{"role": "assistant", "content": "Memory is deeply selective..."},
...
]
Then run:
convoseed-encode --input my_conversation.json --output identity.fp
Or in Python:
import json
from convoseed_agent import encode_conversation
with open("my_conversation.json") as f:
messages = json.load(f)
encode_conversation(messages, "identity.fp")
# → identity.fp (~200KB, fixed size regardless of conversation length)
Step 3 — identify a speaker from a new message
from convoseed_agent import identify, load_fp
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
winner, scores = identify(
query_text="I wonder if that cognitive style could be captured somehow",
fp_paths=["identity.fp", "other_person.fp"],
model=model
)
print(f"Best match: {winner}")
for path, score in sorted(scores.items(), key=lambda x: -x[1]):
print(f" {score:.4f} {path}")
What it does
Messages → SBERT embed → PCA compress → HDC bind → .fp file
- Embed — Sentence-BERT encodes each message into a 384-dim vector
- Compress — PCA extracts the style centroid (4 components = full accuracy)
- Bind — Hyperdimensional Computing (10,000-dim) weaves temporal sequence into one vector
- Save — Written to a portable JSON-based
.fpfile (~200KB)
Key result from the research paper: 4 PCA components capture full speaker identification accuracy, meaning conversational style is genuinely low-dimensional. You can represent how someone thinks with 4 numbers.
Research results
Validated on a real 524-message researcher-AI conversation:
| Model | Avg Similarity | Peak | Msgs > 0.7 |
|---|---|---|---|
| GPT-2 (124M) | 0.464 | 1.000 | 1 |
| Gemma3:1b | 0.466 | 0.707 | 1 |
| Gemma3:12b | 0.523 | 0.757 | 4 |
Speaker identification: 52% accuracy on 10 candidates (vs 10% random baseline), p < 10⁻¹⁰⁰.
Optional: generation (requires torch)
pip install convoseed-agent[decode]
from convoseed_agent import generate_with_prefix, load_fp
fp = load_fp("identity.fp")
output = generate_with_prefix("Tell me about your weekend", fp, model_name="gpt2")
print(output)
Status
Early research. Proof-of-concept validated on real data. Open for collaboration.
- CSP-1 protocol specification
- Encoder / decoder / identifier
- Speaker identification experiment (1,000 trials)
- Multi-model validation
- Cross-model mapping (open research problem)
- Public fingerprint registry
Links
- GitHub: https://github.com/0xAshraFF/ConvoSeed
- Protocol spec:
/spec/CSP-1.mdin the repo - Research paper:
/docs/in the repo
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file convoseed_agent-2.0.0.tar.gz.
File metadata
- Download URL: convoseed_agent-2.0.0.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b76f2696fa5490b15961d2ecd3f930ffa6c7ef4ccd7ac34a787ef3d2ca134070
|
|
| MD5 |
515291b99fbccd09c87a83003149d33a
|
|
| BLAKE2b-256 |
6fb7c23ff58dfcd4777c8d23e1100a09e06eb537bf98a83fa5cee702c0b6e27a
|
File details
Details for the file convoseed_agent-2.0.0-py3-none-any.whl.
File metadata
- Download URL: convoseed_agent-2.0.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41b0987e774ddb89efa0c6422d0a05844981ee3fb71db9062eb036fcdbc8fc95
|
|
| MD5 |
2bc62b47edabcea07a5072381779031e
|
|
| BLAKE2b-256 |
8bd7523eba987985e7b778d74a9edd8b05a3711bba5ddbe604b1715d799ab56e
|