Skip to main content

SozoGraph v1: transcript/db object -> portable cognitive passport JSON

Project description

SozoGraph (v1) — The Cognitive Passport

SozoGraph turns interaction history (transcripts + DB objects) into a portable cognitive snapshot you can pass into any AI agent context on the fly.

It answers one question cleanly:

"Given everything that has happened so far, what should an agent currently believe about this user?"

Not:

  • what was said
  • what is similar
  • what might be relevant

But:

  • what is true now
  • what is stable
  • what is unresolved
  • what is contradictory (resolved by time)

Why this exists (the problem)

Most "memory" systems are either:

  • prompt stuffing (expensive, degrades reasoning, no forgetting)
  • vector RAG (good recall, weak truth/temporal consistency)
  • app-specific notes (non-portable, brittle schemas)

So agents keep acting like "goldfish" even when data exists.

SozoGraph v1 is a truth-layer memory object:

  • typed (facts vs preferences vs entities vs open loops)
  • temporal (new updates override old; contradictions are explicit)
  • portable (a lightweight JSON passport + a compact context string)

Install

pip install sozograph

Configure

Create a .env file (see .env.example):

GEMINI_API_KEY=your_key_here
SOZOGRAPH_EXTRACTOR_MODEL=gemini-3-flash
SOZOGRAPH_ENABLE_FALLBACK_SUMMARIZER=true
SOZOGRAPH_MAX_INTERACTION_CHARS=4000
SOZOGRAPH_DEFAULT_CONTEXT_BUDGET=3000

Quickstart

1) Single transcript → Passport

from sozograph import SozoGraph

sg = SozoGraph()

passport, stats = sg.ingest(
    "I'm Quantilytix. I build software and want direct answers. I'm working on SozoGraph v1.",
    meta={"user_key": "u_123", "source": "transcript:demo-1"}
)

print(passport.to_compact_dict())
print(stats)  # per-interaction merge stats

2) List of transcripts / message history (supported ✅)

history = [
    {"createdAt": "2026-02-01T10:00:00Z", "project_title": "SozoFix", "transcript": "I'm renovating my kitchen."},
    {"createdAt": "2026-02-02T09:30:00Z", "project_title": "SozoFix", "transcript": "I prefer rustic style and hate glossy paint."},
    {"createdAt": "2026-02-03T12:10:00Z", "project_title": "SozoGraph", "transcript": "We need portable memory JSON. No infra. Truth-layer."},
]

# You can ingest a list directly. SozoGraph will coerce items internally.
passport, _ = sg.ingest(history, hint="firestore")  # hint optional; see below

Tip: If your list items aren’t “docs”, you can pass them as plain dicts and let fallback summarization help when needed. If your dicts contain a transcript field, extraction will still succeed (it will stringify deterministically).


3) Firestore object ingestion (objects-only)

You fetch your Firestore data in your app, then pass the dict here:

firestore_doc = {
  "id": "abc123",
  "createdAt": "2026-02-03T10:00:00Z",
  "title": "User Profile Update",
  "notes": "User says they prefer direct answers.",
  "companyCode": "QX",
}

passport, _ = sg.ingest(
    firestore_doc,
    hint="firestore",
    meta={"source": "firestore:/users/abc123", "user_key": "u_abc123"}
)

4) Firebase Realtime DB ingestion (path + value)

RTDB is tree-based, so pass an envelope:

rtdb_snapshot = {
  "path": "/users/u1/profile",
  "value": {
    "updatedAt": 1738560000000,
    "displayName": "Quantilytix",
    "preferences": {"tone": "direct"}
  }
}

passport, _ = sg.ingest(rtdb_snapshot, hint="rtdb", meta={"user_key": "u1"})

5) Supabase ingestion (table + row)

supabase_row = {
  "table": "events",
  "row": {
    "id": 77,
    "created_at": "2026-02-03T11:22:00Z",
    "event": "user_preference_update",
    "notes": "User wants strategy alignment before code."
  }
}

passport, _ = sg.ingest(supabase_row, hint="supabase", meta={"user_key": "u1"})

SozoGraph Test Fixtures

These fixtures are intentionally small and human-readable.

They are designed to test:

  • transcript ingestion
  • Firestore document ingestion
  • Firebase Realtime Database snapshots
  • Supabase row ingestion

They are NOT meant to simulate production-scale data. If a fixture grows beyond what a human would comfortably read, it is probably violating SozoGraph v1 philosophy.

Export a compact agent “briefing” (context injection)

You can inject this into any agent prompt:

briefing = sg.export_context(passport, budget_chars=2500)
print(briefing)

Example output format:

SOZOGRAPH PASSPORT v1
User: u1
Updated: 2026-02-03T12:34:56+00:00

Facts (current beliefs):
- role: software development
- current_project: sozograph v1
...

Preferences:
- tone: direct
...

Open loops:
- finalize v1 repo + publish pip package
...

How SozoGraph v1 works

Ingestion pipeline (v1)

  1. Coerce input into canonical Interaction objects (deterministic)
  2. If the derived text is weak/noisy, call Gemini fallback summarizer (optional)
  3. Use Gemini extractor (strict JSON) to propose memory updates
  4. Use deterministic resolver to merge:
  • temporal priority (latest wins)
  • explicit contradictions record changes
  • de-dupe entities + aliases
  • keep open loops short and recent

What SozoGraph v1 is NOT

  • Not a graph database
  • Not RAG
  • Not embeddings
  • Not a long transcript store
  • Not a tool that fetches from DB (objects-only by design)

Roadmap (upcoming features)

v1.x (near-term)

  • Better input detection for common “transcript list” shapes (e.g. {transcript, createdAt})
  • CLI:
    • sozograph ingest transcript.txt --out passport.json
    • sozograph render passport.json --budget 3000
  • Stronger JSON recovery if a model response is slightly malformed
  • More deterministic evidence linking (source-id mapping improvements)

v1.5 (planned, optional)

  • Graph engine support (Neo4j Aura / Memgraph) via Bolt
  • Cypher-style relational queries over memory
  • Temporal deprecation on edges
  • Export “active truth subgraph” to context

v2 (optional)

  • Foundational model adapters (non-Gemini backends)
  • MCP tool server integration
  • Hybrid patterns (graph + vector) only where needed

Contributing

We want contributions, but keep v1 disciplined.

Good contributions

  • Adapters for additional object shapes (still objects-only)
  • Resolver improvements (deterministic)
  • Tests for merge/contradiction edge-cases
  • Prompt tuning for more stable key extraction

What won’t be accepted in v1

  • Adding DB client dependencies (firebase-admin, supabase clients, etc.)
  • Building RAG/embeddings into core
  • Turning v1 into a graph project

How to contribute

  1. Fork the repo
  2. Create a branch: feat/<short-name>
  3. Add tests where relevant
  4. Open a PR with a short explanation and sample input/output

License

MIT — Sozo Analytics Lab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sozograph-0.1.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sozograph-0.1.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file sozograph-0.1.0.tar.gz.

File metadata

  • Download URL: sozograph-0.1.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for sozograph-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c9bbe9c158766a636c082153f2ca621fecbfe07e32143e4433ed8bcfa504807
MD5 1e0b5201d61a52ca3f68fc04189ef4d8
BLAKE2b-256 a98af24b57e6a35dc96c4630e2f967fd1963dd8d8b78b4380b81e2e24e2911de

See more details on using hashes here.

File details

Details for the file sozograph-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sozograph-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for sozograph-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f02d7ca906051d0c1b0421b319cd741b202092332584e11cc082c61b270fe13
MD5 0b90c540a5ef0128445600c6e0e169f5
BLAKE2b-256 94833adcaafebbfd18e4bee25138ded0ef37be65dd544fa1b7ba025f916f5bea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page