S3-native cassette substrate for typed knowledge graphs: Dempster-Shafer reasoner, sheaf GNN, Kan-based schema migration. No GPU, 17MB SPLADE bundled.
Project description
cassetteql
S3-native knowledge graphs with calibrated reasoning. Immutable cassette files, split Parquet indexes, a Dempster–Shafer reasoner that tells you when it doesn't know, a trained sheaf GNN prior, and Kan-based schema migration. The bundled SPLADE-tiny model (17 MB) means no GPU, no model download, no API keys.
┌──────────────────────────────────┐
│ ▓▓ .inf cassette ▓▓ │ one cassette =
│ ● ● │ one ingest batch
│ ╲╲─────────────────────────╱╱ │
│ header · records · footer │
│ │ │ │ │
│ │ │ └─ JSON: offsets + stats
│ │ └─ gzip frames (range-addressable)
│ └─ schema_ref · created_at
└──────────────────────────────────┘
immutable · content-addressed · S3-native
pip install cassetteql
Imports as cognition (same pattern as pillow/PIL):
from cognition.cassette import InfonStore, Query, Analyst
One-minute start
from cognition.cassette import InfonStore, Query
store = InfonStore("./data/chips", schema_path="schema.json")
# Delta ingest — idempotent, reports coverage diagnostics for free.
result = store.ingest(documents)
print(result["report"].summary())
# Calibrated single-claim verdict.
v = store.ask(Query().where(subject="toyota",
predicate="invest",
object="solid_state"))
print(v.label, v.mass.supports, v.mass.theta) # SUPPORTS 0.53 0.29
# Multi-hop MCTS with retraction-aware chain mass.
v = store.connect("toyota", "catl")
# One tree walk resolves connectivity to many targets.
vs = store.any_of("toyota", {"catl", "lg", "samsung", "sk_hynix"})
Swap the root URI for s3://bucket/prefix and the same code runs against S3:
pip install 'cassetteql[s3]'
store = InfonStore("s3://acme/chips", schema_path="schema.json")
What makes it different
| Cassette substrate | Immutable content-addressed .inf files; split Parquet indexes per cassette; append-only manifest chain. Delta ingest never rewrites; time-travel snapshots cost one JSON read. |
| Calibrated verdicts | Every answer carries (supports, refutes, theta). On claims the corpus can't answer, θ → 1.0 and no range-gets are issued — the pruner short-circuits. |
| Sheaf GNN prior | 140k-param encoder with per-relation-kind restriction maps, trained once on synthetic hypergraphs (no human labels). 99% on held-out, +94% over symbolic on reportive-edge anomalies. |
| Schema migration | SchemaFunctor(rename, merge, delete) rewrites cassettes under a new ontology via Kan pushforward. 60× faster than reingestion; old cassettes stay. |
| Strands Analyst | Nine tools exposed to any Strands agent: schema / ingest / report / ask / connect / any_of / findings. System prompt enforces source citation and honest NEI. |
Optional extras
pip install 'cassetteql[s3]' # S3 / GCS / Azure via fsspec
pip install 'cassetteql[agent]' # Strands Analyst
pip install 'cassetteql[aws]' # Lambda container deploy + S3
pip install 'cassetteql[all]' # everything optional
Measured
Each row below is a reproducible probe — a standalone Python script that writes a temp store, runs the scenario, and asserts the result. Probes ship inside the source distribution.
| Symbolic only | With sheaf GNN | |
|---|---|---|
| 10-claim actor-to-actor eval | 40% | 100% |
| 2000-sample synthgen held-out | 88.5% | 99.2% |
| Reportive-edge anomaly accuracy | 6% | 100% |
| Range-gets per MCTS query at 300 cassettes | 20 | 1.4 |
| Migration vs. reingest (10-infon store) | 1245 ms | 20 ms (62×) |
Dependencies
| Package | Purpose | Required |
|---|---|---|
torch ≥ 2.0 |
Reasoner + GNN + SSL losses | yes |
transformers ≥ 4.40 |
SPLADE tokenizer/model | yes |
numpy ≥ 1.24 |
Linear algebra | yes |
pyarrow ≥ 15 |
Cassette indexes | yes |
fsspec ≥ 2024.1 |
Local + cloud paths | yes |
s3fs ≥ 2024.1 |
S3 backend | via [s3] |
strands-agents ≥ 1.0 |
Conversational Analyst | via [agent] |
boto3 ≥ 1.28 |
Lambda deploy + ECR | via [aws] |
17 MB SPLADE-tiny ships inside the wheel — one pip install, no follow-up download, no GPU.
License
Apache-2.0. The bundled SPLADE-tiny-msmarco model is also Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cassetteql-0.1.1.tar.gz.
File metadata
- Download URL: cassetteql-0.1.1.tar.gz
- Upload date:
- Size: 17.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd189e42e100c89e6759bc5aeb7d3c9f60259121e35d1794ef0d6e0187c065ec
|
|
| MD5 |
1cdd545e804bb53b980e6f5e21f1287d
|
|
| BLAKE2b-256 |
479fd3c35bef333565ff9ed60b0e7d01862b0a867c8ec91a36bb81281bf11edc
|
File details
Details for the file cassetteql-0.1.1-py3-none-any.whl.
File metadata
- Download URL: cassetteql-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7167313dc789d99f16cd5e8afb2767b98d3c4fbca1adde0e75aa203a8ac43f2e
|
|
| MD5 |
34dd5b722221870b1a9d94d87aa7dbdd
|
|
| BLAKE2b-256 |
9e1d5b12afb2e5bbd6e2564fd72f2ab051b2e13d6fba9ad5f837f51bb0601f4d
|