Skip to main content

Python bindings for Cabinet - Hierarchical Semantic Hashing memory retrieval

Project description

cabinet

PyPI Python License

Python bindings for Cabinet — a discrete semantic memory retrieval system for AI agents.

Replace 768-dim dense vectors with 20-bit structured integer codes and retrieve on pure CPU with O(log n) B-tree prefix matching.


What is Cabinet?

Cabinet is a memory retrieval engine designed for Agent scenarios where you need to:

  • Remember large amounts of text on a laptop or edge device
  • Recall relevant snippets fast, without GPU
  • Explain why a snippet was retrieved (category → cluster → word, four-level matching)
  • Update incrementally without rebuilding the whole index

The core idea is Hierarchical Semantic Hashing (HSH): each word is encoded as a 20-bit structured integer:

┌──────┬─────────┬─────────┐
│ feat │   sim   │   abs   │
│ 4-bit│  8-bit  │  8-bit  │
└──────┴─────────┴─────────┘
   ↓        ↓         ↓
 POS tag  cluster   bucket

Retrieval becomes integer prefix matching on B-trees, which is tiny, fast, and fully auditable.


Installation

# Core package (pre-compiled wheels, no Rust needed)
pip install cabinet-hsh

# With optional GUI visualization
pip install cabinet-hsh[gui]

# With document parsing (PDF, DOCX, XLSX)
pip install cabinet-hsh[docs]

# With plotting utilities
pip install cabinet-hsh[plot]

# Development install from source (requires Rust 1.72+)
git clone https://github.com/Sauomore/Cabinet.git
cd Cabinet/cabinet
maturin develop

Quick Start

import cabinet

# Open a memory cabinet (~4MB RAM + single SQLite file)
mem = cabinet.Memory(
    path="./agent_memory.db",
    precision="light",    # light | hybrid | precise
    pos_threshold=50,     # common-word promotion threshold
    max_context=4096,     # working-memory window
)

# Insert snippets
mem.insert("用户明天下午3点开会,准备PPT。")
mem.insert("用户喜欢听管弦乐。")
mem.insert("5号楼邻居有梯子,平时放在车库。")

# Query
results = mem.query("会议准备", top_k=5)
for r in results:
    level = ["关联", "同类", "同簇", "精确"][r.match_level - 1]
    print(f"[{level}] score={r.score:.3f} doc_id={r.doc_id}")
    if r.match_level >= 3:
        print(f"  → {mem.decode(r)}")

# Snapshot and close
mem.snapshot("./backup/agent_memory_2026-07-03.db")
mem.close()

API Overview

cabinet.Memory

Memory(
    path: str,               # SQLite database path
    precision: str,          # "light" | "hybrid" | "precise"
    pos_threshold: int,      # frequent-word promotion threshold
    max_context: int,        # working-memory capacity in tokens
)

Methods:

  • insert(text: str) -> int — tokenize, encode, and store a document; returns doc_id
  • query(text: str, top_k: int = 10) -> list[QueryResult] — retrieve top-k matches
  • decode(result: QueryResult) -> str | None — decode the original text of a result
  • snapshot(dst: str) -> None — copy the database to dst
  • close() -> None — close the database

cabinet.QueryResult

A result object with the following fields:

Field Type Meaning
doc_id int document ID
position int word position inside the document
score float relevance score
match_level int 1=related, 2=same category, 3=same cluster, 4=exact

Context decoding

from cabinet import decode_context

results = mem.query("借梯子", top_k=3)
for r in results:
    text = decode_context(mem, r, mode="sentence")
    print(text)

Supported mode values: "paragraph", "sentence", "window", "before", "after", "window_sent".


Supported Platforms

Pre-compiled wheels are provided for:

  • Linux: x86_64, aarch64 (manylinux)
  • macOS: universal2 (Intel + Apple Silicon)
  • Windows: x64, x86

Requires Python ≥ 3.8 (CPython).


Architecture

cabinet (Python API)
  └── PyO3 bindings
      └── cabinet-core (Rust)
          ├── cabinet-hsh     # 20-bit HSH encoding
          ├── cabinet-index   # B-tree prefix index + LSM
          ├── cabinet-store   # SQLite backend
          └── cabinet-router  # relevance routing

Three-layer memory model:

  1. Token Store — raw HSH sequences, append-only WAL buffer
  2. Archive Index — 16 feature drawers with B-tree (sim, abs) indexes
  3. Working Memory — LRU hot cache for inference-time hits

When to use Cabinet vs. vector databases

Scenario Cabinet FAISS / Chroma
Laptop / edge device ✅ Tiny CPU model ❌ Needs GPU or large RAM
Incremental updates ✅ Append-only ❌ Rebuild clusters
Explainable retrieval ✅ Auditable path ❌ Black-box similarity
Semantic similarity ⚠️ Discrete approximation ✅ Dense vectors

Use Cabinet when you need a small, fast, explainable, and incrementally-updatable memory for Agents.


GUI Visualization

If you installed with [gui]:

cabinet-gui
# or
cd cabinet-gui
streamlit run app.py

The GUI includes pages for encoding visualization, memory architecture, retrieval paths, index browser, and an interactive console.


License

MIT OR Apache-2.0


Cabinet — let AI remember, and explain why it remembers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cabinet_hsh-0.1.4.tar.gz (59.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cabinet_hsh-0.1.4-cp38-abi3-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

cabinet_hsh-0.1.4-cp38-abi3-win32.whl (3.7 MB view details)

Uploaded CPython 3.8+Windows x86

cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

cabinet_hsh-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (8.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cabinet_hsh-0.1.4.tar.gz.

File metadata

  • Download URL: cabinet_hsh-0.1.4.tar.gz
  • Upload date:
  • Size: 59.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for cabinet_hsh-0.1.4.tar.gz
Algorithm Hash digest
SHA256 bad63e10e8b0a6994d4a61b94930eaddb6c0c492ccf174955b037058577a242e
MD5 a5087113c8d55934ccd2c39f394f4ff4
BLAKE2b-256 3e4789be7569096876b7376fa2b822bee341e28d5b328893514a454e250fc464

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 921d518b5cc106eec94dead265663ecf31e9fc6e22cb5dd30983cfaf6b3953ba
MD5 dd41688a1b9aafa06e31b11a326ae07e
BLAKE2b-256 afac059ddc22fdd0b18524d05c9c5740b0226464e1d42131ac7c8dc6d7e14791

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.4-cp38-abi3-win32.whl.

File metadata

  • Download URL: cabinet_hsh-0.1.4-cp38-abi3-win32.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for cabinet_hsh-0.1.4-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 c4a659e0f7b8533f305639555f43cc728605e9bfef93e4b006de218d3c724d51
MD5 f5c89a4cd66d38ab811f4daf9abe5da8
BLAKE2b-256 2c0a7700f62c97ea06a353d2ef18c9768db50bd95a80abe06d466a712b7441d5

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fe90d76a00f0f427fbbcfc7bb6a6a90541c4773ffc07e126a36b16fa277a6c6a
MD5 e04f80303f18b3b18d1ea4a53b43b1e1
BLAKE2b-256 3a512924b4e07bb78a08b948bc09cdc5708bc70757f659283432ccd735fee81f

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3e3cc58be5535a9ad7858be6946a6e2dc1ca05517547d22a9d2781eccdc856cb
MD5 12b07ebfc8a4d38fe88ec44a525e6c54
BLAKE2b-256 f4b1c2628a9c17c26273cca446e71ac812b107b05aed0a1f220d62b38f82b44a

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 f2ccb4a1926732773ebd71a6e0518589854f7b17b4b55bb4e543e1c7b1a35a2b
MD5 e473bbc041dd7513347b414259960671
BLAKE2b-256 6b160495965e66216a56770b8752d1d6e7bfbe02d2796e8137a0d3c91c170e1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page