Skip to main content

Python bindings for Cabinet - Hierarchical Semantic Hashing memory retrieval

Project description

pycabinet

PyPI Python License

Python bindings for Cabinet — a discrete semantic memory retrieval system for AI agents.

Replace 768-dim dense vectors with 20-bit structured integer codes and retrieve on pure CPU with O(log n) B-tree prefix matching.


What is Cabinet?

Cabinet is a memory retrieval engine designed for Agent scenarios where you need to:

  • Remember large amounts of text on a laptop or edge device
  • Recall relevant snippets fast, without GPU
  • Explain why a snippet was retrieved (category → cluster → word, four-level matching)
  • Update incrementally without rebuilding the whole index

The core idea is Hierarchical Semantic Hashing (HSH): each word is encoded as a 20-bit structured integer:

┌──────┬─────────┬─────────┐
│ feat │   sim   │   abs   │
│ 4-bit│  8-bit  │  8-bit  │
└──────┴─────────┴─────────┘
   ↓        ↓         ↓
 POS tag  cluster   bucket

Retrieval becomes integer prefix matching on B-trees, which is tiny, fast, and fully auditable.


Installation

# Core package (pre-compiled wheels, no Rust needed)
pip install cabinet-hsh

# With optional GUI visualization
pip install cabinet-hsh[gui]

# With document parsing (PDF, DOCX, XLSX)
pip install cabinet-hsh[docs]

# With plotting utilities
pip install cabinet-hsh[plot]

# Development install from source (requires Rust 1.72+)
git clone https://github.com/Sauomore/Cabinet.git
cd Cabinet/cabinet
maturin develop

Quick Start

import pycabinet

# Open a memory cabinet (~4MB RAM + single SQLite file)
mem = pycabinet.Memory(
    path="./agent_memory.db",
    precision="light",    # light | hybrid | precise
    pos_threshold=50,     # common-word promotion threshold
    max_context=4096,     # working-memory window
)

# Insert snippets
mem.insert("用户明天下午3点开会,准备PPT。")
mem.insert("用户喜欢听管弦乐。")
mem.insert("5号楼邻居有梯子,平时放在车库。")

# Query
results = mem.query("会议准备", top_k=5)
for r in results:
    level = ["关联", "同类", "同簇", "精确"][r.match_level - 1]
    print(f"[{level}] score={r.score:.3f} doc_id={r.doc_id}")
    if r.match_level >= 3:
        print(f"  → {mem.decode(r)}")

# Snapshot and close
mem.snapshot("./backup/agent_memory_2026-07-03.db")
mem.close()

API Overview

pycabinet.Memory

Memory(
    path: str,               # SQLite database path
    precision: str,          # "light" | "hybrid" | "precise"
    pos_threshold: int,      # frequent-word promotion threshold
    max_context: int,        # working-memory capacity in tokens
)

Methods:

  • insert(text: str) -> int — tokenize, encode, and store a document; returns doc_id
  • query(text: str, top_k: int = 10) -> list[QueryResult] — retrieve top-k matches
  • decode(result: QueryResult) -> str | None — decode the original text of a result
  • snapshot(dst: str) -> None — copy the database to dst
  • close() -> None — close the database

pycabinet.QueryResult

A result object with the following fields:

Field Type Meaning
doc_id int document ID
position int word position inside the document
score float relevance score
match_level int 1=related, 2=same category, 3=same cluster, 4=exact

Context decoding

from pycabinet import decode_context

results = mem.query("借梯子", top_k=3)
for r in results:
    text = decode_context(mem, r, mode="sentence")
    print(text)

Supported mode values: "paragraph", "sentence", "window", "before", "after", "window_sent".


Supported Platforms

Pre-compiled wheels are provided for:

  • Linux: x86_64, aarch64 (manylinux)
  • macOS: universal2 (Intel + Apple Silicon)
  • Windows: x64, x86

Requires Python ≥ 3.8 (CPython).


Architecture

pycabinet (Python API)
  └── PyO3 bindings
      └── cabinet-core (Rust)
          ├── cabinet-hsh     # 20-bit HSH encoding
          ├── cabinet-index   # B-tree prefix index + LSM
          ├── cabinet-store   # SQLite backend
          └── cabinet-router  # relevance routing

Three-layer memory model:

  1. Token Store — raw HSH sequences, append-only WAL buffer
  2. Archive Index — 16 feature drawers with B-tree (sim, abs) indexes
  3. Working Memory — LRU hot cache for inference-time hits

When to use Cabinet vs. vector databases

Scenario Cabinet FAISS / Chroma
Laptop / edge device ✅ Tiny CPU model ❌ Needs GPU or large RAM
Incremental updates ✅ Append-only ❌ Rebuild clusters
Explainable retrieval ✅ Auditable path ❌ Black-box similarity
Semantic similarity ⚠️ Discrete approximation ✅ Dense vectors

Use Cabinet when you need a small, fast, explainable, and incrementally-updatable memory for Agents.


GUI Visualization

If you installed with [gui]:

cabinet-gui
# or
cd cabinet-gui
streamlit run app.py

The GUI includes pages for encoding visualization, memory architecture, retrieval paths, index browser, and an interactive console.


License

MIT OR Apache-2.0


Cabinet — let AI remember, and explain why it remembers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cabinet_hsh-0.1.3.tar.gz (59.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cabinet_hsh-0.1.3-cp38-abi3-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

cabinet_hsh-0.1.3-cp38-abi3-win32.whl (3.7 MB view details)

Uploaded CPython 3.8+Windows x86

cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

cabinet_hsh-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (8.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cabinet_hsh-0.1.3.tar.gz.

File metadata

  • Download URL: cabinet_hsh-0.1.3.tar.gz
  • Upload date:
  • Size: 59.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for cabinet_hsh-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c8758d945018e2ca9503cc25658dc0d383091afbd33d08eff1d08cf1b051c0ab
MD5 e9b905df646b077c78304f309128fbe9
BLAKE2b-256 32cab03e2c11597cd7dadce00c0bdfecf3fd839ad5dd7afb90b0c6307d1e1c49

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cebd79205d1419bc853688138341e617244bae234a3891ffc8aef0dd931974dc
MD5 45889b7e85d7f3dbffb20a06b727f9a1
BLAKE2b-256 0719e6df0244daa29e862e99b134dbae8939c6090dbaf8cee888b2fda320bf49

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.3-cp38-abi3-win32.whl.

File metadata

  • Download URL: cabinet_hsh-0.1.3-cp38-abi3-win32.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for cabinet_hsh-0.1.3-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 4c3b452127bb772b86e34cb7e187120ba075a144255db5cfbf390a3b44e036b0
MD5 e2b7b944f4228a1527e4844682089f14
BLAKE2b-256 36fe2500620b71036998afb51792c97f8ea1c8fb8376764d21da3bd0cdf7c1ac

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b31d45b5ba62f55aa66e1241269111d94a904477b8d442f04ec642958bad952c
MD5 42bce2dc9a2bbae15247c6bed6493fe7
BLAKE2b-256 4e9bde65867f36426bbfe7ce58cd468737fc852e855fae945d59815c1003d8de

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 829789b850c43f067e7917fb4a53aecdbea30cb41056b915f9cbc651ee82933b
MD5 dbef212d40dc1345eb4d5ddd790f7032
BLAKE2b-256 9f47c18dd4f11f94407bfb1213797af0d9db944c55af9ff8ca8fac9a817e6430

See more details on using hashes here.

File details

Details for the file cabinet_hsh-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cabinet_hsh-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 ffa1da06f6d557e9e821312407784923a2e4677f45f0d4970f6dacc6f5c929f3
MD5 498068bced0538db1cc300c85df8a912
BLAKE2b-256 6104798a39b6636227a8de830dd8e70d0ee014842d70a624dc6c055e45121055

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page