Skip to main content

S3-native cassette substrate for typed knowledge graphs: Dempster-Shafer reasoner, sheaf GNN, Kan-based schema migration. No GPU, 17MB SPLADE bundled.

Project description

cassetteql

S3-native knowledge graphs with calibrated reasoning. Immutable cassette files, split Parquet indexes, a Dempster–Shafer reasoner that tells you when it doesn't know, a trained sheaf GNN prior, and Kan-based schema migration. The bundled SPLADE-tiny model (17 MB) means no GPU, no model download, no API keys.

  ┌──────────────────────────────────┐
  │  ▓▓      .inf  cassette      ▓▓  │  one cassette =
  │  ●                            ●  │  one ingest batch
  │  ╲╲─────────────────────────╱╱   │
  │   header  ·  records  ·  footer  │
  │     │         │           │      │
  │     │         │           └─ JSON: offsets + stats
  │     │         └─ gzip frames (range-addressable)
  │     └─ schema_ref · created_at
  └──────────────────────────────────┘
      immutable · content-addressed · S3-native
pip install cassetteql

Imports as cognition (same pattern as pillow/PIL):

from cognition.cassette import InfonStore, Query, Analyst

One-minute start

from cognition.cassette import InfonStore, Query

store = InfonStore("./data/chips", schema_path="schema.json")

# Delta ingest — idempotent, reports coverage diagnostics for free.
result = store.ingest(documents)
print(result["report"].summary())

# Calibrated single-claim verdict.
v = store.ask(Query().where(subject="toyota",
                             predicate="invest",
                             object="solid_state"))
print(v.label, v.mass.supports, v.mass.theta)   # SUPPORTS 0.53 0.29

# Multi-hop MCTS with retraction-aware chain mass.
v = store.connect("toyota", "catl")

# One tree walk resolves connectivity to many targets.
vs = store.any_of("toyota", {"catl", "lg", "samsung", "sk_hynix"})

Swap the root URI for s3://bucket/prefix and the same code runs against S3:

pip install 'cassetteql[s3]'
store = InfonStore("s3://acme/chips", schema_path="schema.json")

What makes it different

Cassette substrate Immutable content-addressed .inf files; split Parquet indexes per cassette; append-only manifest chain. Delta ingest never rewrites; time-travel snapshots cost one JSON read.
Calibrated verdicts Every answer carries (supports, refutes, theta). On claims the corpus can't answer, θ → 1.0 and no range-gets are issued — the pruner short-circuits.
Sheaf GNN prior 140k-param encoder with per-relation-kind restriction maps, trained once on synthetic hypergraphs (no human labels). 99% on held-out, +94% over symbolic on reportive-edge anomalies.
Schema migration SchemaFunctor(rename, merge, delete) rewrites cassettes under a new ontology via Kan pushforward. 60× faster than reingestion; old cassettes stay.
Strands Analyst Nine tools exposed to any Strands agent: schema / ingest / report / ask / connect / any_of / findings. System prompt enforces source citation and honest NEI.

Optional extras

pip install 'cassetteql[s3]'       # S3 / GCS / Azure via fsspec
pip install 'cassetteql[agent]'    # Strands Analyst
pip install 'cassetteql[aws]'      # Lambda container deploy + S3
pip install 'cassetteql[all]'      # everything optional

Measured

Each row below is a reproducible probe — a standalone Python script that writes a temp store, runs the scenario, and asserts the result. Probes ship inside the source distribution.

Symbolic only With sheaf GNN
10-claim actor-to-actor eval 40% 100%
2000-sample synthgen held-out 88.5% 99.2%
Reportive-edge anomaly accuracy 6% 100%
Range-gets per MCTS query at 300 cassettes 20 1.4
Migration vs. reingest (10-infon store) 1245 ms 20 ms (62×)

Dependencies

Package Purpose Required
torch ≥ 2.0 Reasoner + GNN + SSL losses yes
transformers ≥ 4.40 SPLADE tokenizer/model yes
numpy ≥ 1.24 Linear algebra yes
pyarrow ≥ 15 Cassette indexes yes
fsspec ≥ 2024.1 Local + cloud paths yes
s3fs ≥ 2024.1 S3 backend via [s3]
strands-agents ≥ 1.0 Conversational Analyst via [agent]
boto3 ≥ 1.28 Lambda deploy + ECR via [aws]

17 MB SPLADE-tiny ships inside the wheel — one pip install, no follow-up download, no GPU.

License

Apache-2.0. The bundled SPLADE-tiny-msmarco model is also Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cassetteql-0.1.1.tar.gz (17.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cassetteql-0.1.1-py3-none-any.whl (17.3 MB view details)

Uploaded Python 3

File details

Details for the file cassetteql-0.1.1.tar.gz.

File metadata

  • Download URL: cassetteql-0.1.1.tar.gz
  • Upload date:
  • Size: 17.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for cassetteql-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd189e42e100c89e6759bc5aeb7d3c9f60259121e35d1794ef0d6e0187c065ec
MD5 1cdd545e804bb53b980e6f5e21f1287d
BLAKE2b-256 479fd3c35bef333565ff9ed60b0e7d01862b0a867c8ec91a36bb81281bf11edc

See more details on using hashes here.

File details

Details for the file cassetteql-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cassetteql-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for cassetteql-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7167313dc789d99f16cd5e8afb2767b98d3c4fbca1adde0e75aa203a8ac43f2e
MD5 34dd5b722221870b1a9d94d87aa7dbdd
BLAKE2b-256 9e1d5b12afb2e5bbd6e2564fd72f2ab051b2e13d6fba9ad5f837f51bb0601f4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page