Skip to main content

Semantic memory plugin for Bub

Project description

Semantic Memory Plugin for Bub

A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.

Overview

This plugin intercepts the tape context building process to:

  1. Extract semantics from conversation entries using an LLM
  2. Store snapshots of entities (people, tasks, concepts) and relations between them
  3. Inject memory into subsequent agent prompts, enabling long-context awareness

The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing build_tape_context architecture without modifying core.

Installation

The plugin is already registered in pyproject.toml:

[project.entry-points."bub"]
semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"

Bub's framework automatically loads and instantiates it on startup. No additional setup required.

How It Works

Per-Turn Flow

  1. Input: Agent receives a new message, tape entries are loaded
  2. Extract: LLM analyzes entries and identifies:
    • Entities: people, tasks, events, concepts
    • Relations: created, depends_on, mentions, etc.
  3. Store: SemanticSnapshot is appended to ~/.bub/tapes/semantic/{tape_id}.jsonl
  4. Load: All historical snapshots for this tape are loaded
  5. Inject: Semantic memory is formatted as a system prompt block and prepended to the context
  6. Output: Agent receives enriched context with semantic awareness

Example

Given this conversation:

User: "Alice created a task to deploy v1.0"
Agent: [responds]
User: "What did Alice do?"

On the second turn, the agent sees:

## Semantic Memory

### Entities (2):
- person:alice
- task:deploy_v1 (v1.0 deployment)

### Relations (1):
- alice --created--> deploy_v1

---

[rest of context]

Architecture

Core Modules

  • models.py: Pydantic dataclasses for Entity, Relation, SemanticSnapshot
  • extractor.py: LLM-based extraction from tape entries
  • store.py: JSONL file storage at ~/.bub/tapes/semantic/
  • context.py: Formatting snapshots into system prompts
  • hook_impl.py: Bub hookimpl that wires everything together

Storage Format

Snapshots are stored as JSONL (one JSON object per line):

{
  "entities": [
    {"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
    {"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
  ],
  "relations": [
    {"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
  ],
  "tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
  "anchor_id": "anchor_001",
  "created_at": "2026-06-06T09:35:00Z"
}

Configuration

The plugin reuses your main LLM settings (BUB_MODEL, BUB_API_KEY, etc.):

# Your existing setup (e.g., DeepSeek)
export BUB_MODEL=deepseek:deepseek-chat
export BUB_API_KEY=sk-...
uv run bub chat

No separate BUB_SEMANTIC_* variables needed. Semantic extraction uses the same model as your agent.

Testing

Run the test suite:

uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v

Coverage: 43 tests across unit and integration scenarios:

  • Entity/Relation serialization
  • JSONL storage I/O
  • LLM extraction with mocks
  • Context building
  • Multi-turn memory retention

Usage Examples

Example 1: CLI Multi-Turn

$ uv run bub chat
bub > Alice is a data scientist.
Agent > Got it.

bub > What is Alice's profession?
Agent > Alice is a data scientist. (retrieved from semantic memory)

bub > ,tape.info
[Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]

Example 2: Telegram

You: "I need to fix a critical bug in the payment module"
Bot: [Uses semantic memory to track bug, module]

You: "What was I working on?"
Bot: [Recalls semantic memory: bug:critical_payment, module:payment]

Example 3: Inspect Semantic Store

$ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
[Shows stored entities and relations]

Performance & Cost

Token Usage

  • Each extraction call: ~300-500 tokens (depends on entry volume)
  • Estimated overhead: +10-20% per turn (configurable via extraction prompt)

Storage

  • JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
  • Typical session: ~50-100 KB

Latency

  • Extraction is async, non-blocking
  • First turn (with extraction): ~500ms extra
  • Subsequent turns: ~50ms extra (just loading snapshots)

Graceful Degradation

If semantic extraction fails for any reason:

  • LLM error: Returns empty snapshot, continues
  • Invalid JSON: Logged as warning, continues
  • Storage error: Logged, continues with base context

The agent always works, semantic memory is optional enhancement.

Future Enhancements

Phase 2: Smart Retrieval

  • Vector embeddings for semantic similarity search
  • Retrieval-augmented context injection (only include relevant entities)
  • Reduces prompt bloat for long sessions

Phase 3: Advanced Graphs

  • Entity dependency analysis (who depends on what)
  • Centrality metrics (who/what is most important)
  • Causal reasoning (what led to what)

Phase 4: Multi-Session Memory

  • Cross-session entity resolution
  • Long-term memory across multiple conversations
  • Persistent entity graph (not just per-tape)

Troubleshooting

Q: Plugin not loading? A: Check that entry-point is registered:

python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"

Q: Semantic snapshots not appearing? A: Check ~/.bub/tapes/semantic/ directory exists. Check logs with BUB_VERBOSE=1.

Q: LLM calls are expensive? A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.

API Reference

build_semantic_context(entries, context, llm=None, store=None) → list[dict]

Build context with semantic memory. Called by the framework automatically.

Args:

  • entries: Iterable of TapeEntry objects
  • context: TapeContext instance
  • llm: republic.LLM instance (optional; if None, returns base context)
  • store: SemanticStore instance (optional; if None, returns base context)

Returns: List of message dicts ready for model input

extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot

Extract entities and relations from tape entries.

Args:

  • entries: List of TapeEntry objects
  • llm: republic.LLM instance for extraction
  • tape_id: Session/tape identifier
  • anchor_id: Optional anchor point identifier
  • max_tokens: Max tokens for LLM response

Returns: SemanticSnapshot with extracted entities/relations

Contributing

This plugin is part of Bub's extensibility model. To extend:

  1. Custom entity types: Modify Entity.type enum in models.py
  2. Custom extractors: Replace or wrap extractor.py
  3. Custom storage: Implement SemanticStore interface
  4. Custom formatters: Replace _format_snapshots in context.py

All without modifying Bub core.

License

Same as Bub (Apache 2.0)


Questions? See Bub documentation or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bub_semantic_memory-0.1.1.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bub_semantic_memory-0.1.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file bub_semantic_memory-0.1.1.tar.gz.

File metadata

  • Download URL: bub_semantic_memory-0.1.1.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bub_semantic_memory-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ae12ff5c4a8fae6171e897d168d85cabc584c2e87a6bffb680f44bb57fb157c
MD5 b700af53950b89441104f7bc4ca99978
BLAKE2b-256 1b5fcafc15a1a2aacf8f01c8db06aab92c05c6aa577fc77ca34226c10a41955e

See more details on using hashes here.

File details

Details for the file bub_semantic_memory-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bub_semantic_memory-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bub_semantic_memory-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d2102842ddabff42115cf28966f6a1ac386649aa20b5623d55e5c7266f3ad36
MD5 22d5e77a8e514c38a607c5e342732c40
BLAKE2b-256 b5a80ea1cc945c1f31e0f70f5e73e1b41401a8d054d34f569870f9e31dc2dc39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page