Semantic memory plugin for Bub
Project description
Semantic Memory Plugin for Bub
A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.
Overview
This plugin intercepts the tape context building process to:
- Extract semantics from conversation entries using an LLM
- Store snapshots of entities (people, tasks, concepts) and relations between them
- Inject memory into subsequent agent prompts, enabling long-context awareness
The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing build_tape_context architecture without modifying core.
Installation
The plugin is already registered in pyproject.toml:
[project.entry-points."bub"]
semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"
Bub's framework automatically loads and instantiates it on startup. No additional setup required.
How It Works
Per-Turn Flow
- Input: Agent receives a new message, tape entries are loaded
- Extract: LLM analyzes entries and identifies:
- Entities: people, tasks, events, concepts
- Relations: created, depends_on, mentions, etc.
- Store: SemanticSnapshot is appended to
~/.bub/tapes/semantic/{tape_id}.jsonl - Load: All historical snapshots for this tape are loaded
- Inject: Semantic memory is formatted as a system prompt block and prepended to the context
- Output: Agent receives enriched context with semantic awareness
Example
Given this conversation:
User: "Alice created a task to deploy v1.0"
Agent: [responds]
User: "What did Alice do?"
On the second turn, the agent sees:
## Semantic Memory
### Entities (2):
- person:alice
- task:deploy_v1 (v1.0 deployment)
### Relations (1):
- alice --created--> deploy_v1
---
[rest of context]
Architecture
Core Modules
models.py: Pydantic dataclasses for Entity, Relation, SemanticSnapshotextractor.py: LLM-based extraction from tape entriesstore.py: JSONL file storage at~/.bub/tapes/semantic/context.py: Formatting snapshots into system promptshook_impl.py: Bub hookimpl that wires everything together
Storage Format
Snapshots are stored as JSONL (one JSON object per line):
{
"entities": [
{"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
{"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
],
"relations": [
{"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
],
"tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
"anchor_id": "anchor_001",
"created_at": "2026-06-06T09:35:00Z"
}
Configuration
The plugin reuses your main LLM settings (BUB_MODEL, BUB_API_KEY, etc.):
# Your existing setup (e.g., DeepSeek)
export BUB_MODEL=deepseek:deepseek-chat
export BUB_API_KEY=sk-...
uv run bub chat
No separate BUB_SEMANTIC_* variables needed. Semantic extraction uses the same model as your agent.
Testing
Run the test suite:
uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v
Coverage: 43 tests across unit and integration scenarios:
- Entity/Relation serialization
- JSONL storage I/O
- LLM extraction with mocks
- Context building
- Multi-turn memory retention
Usage Examples
Example 1: CLI Multi-Turn
$ uv run bub chat
bub > Alice is a data scientist.
Agent > Got it.
bub > What is Alice's profession?
Agent > Alice is a data scientist. (retrieved from semantic memory)
bub > ,tape.info
[Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]
Example 2: Telegram
You: "I need to fix a critical bug in the payment module"
Bot: [Uses semantic memory to track bug, module]
You: "What was I working on?"
Bot: [Recalls semantic memory: bug:critical_payment, module:payment]
Example 3: Inspect Semantic Store
$ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
[Shows stored entities and relations]
Performance & Cost
Token Usage
- Each extraction call: ~300-500 tokens (depends on entry volume)
- Estimated overhead: +10-20% per turn (configurable via extraction prompt)
Storage
- JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
- Typical session: ~50-100 KB
Latency
- Extraction is async, non-blocking
- First turn (with extraction): ~500ms extra
- Subsequent turns: ~50ms extra (just loading snapshots)
Graceful Degradation
If semantic extraction fails for any reason:
- LLM error: Returns empty snapshot, continues
- Invalid JSON: Logged as warning, continues
- Storage error: Logged, continues with base context
The agent always works, semantic memory is optional enhancement.
Future Enhancements
Phase 2: Smart Retrieval
- Vector embeddings for semantic similarity search
- Retrieval-augmented context injection (only include relevant entities)
- Reduces prompt bloat for long sessions
Phase 3: Advanced Graphs
- Entity dependency analysis (who depends on what)
- Centrality metrics (who/what is most important)
- Causal reasoning (what led to what)
Phase 4: Multi-Session Memory
- Cross-session entity resolution
- Long-term memory across multiple conversations
- Persistent entity graph (not just per-tape)
Troubleshooting
Q: Plugin not loading? A: Check that entry-point is registered:
python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"
Q: Semantic snapshots not appearing?
A: Check ~/.bub/tapes/semantic/ directory exists. Check logs with BUB_VERBOSE=1.
Q: LLM calls are expensive? A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.
API Reference
build_semantic_context(entries, context, llm=None, store=None) → list[dict]
Build context with semantic memory. Called by the framework automatically.
Args:
entries: Iterable of TapeEntry objectscontext: TapeContext instancellm: republic.LLM instance (optional; if None, returns base context)store: SemanticStore instance (optional; if None, returns base context)
Returns: List of message dicts ready for model input
extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot
Extract entities and relations from tape entries.
Args:
entries: List of TapeEntry objectsllm: republic.LLM instance for extractiontape_id: Session/tape identifieranchor_id: Optional anchor point identifiermax_tokens: Max tokens for LLM response
Returns: SemanticSnapshot with extracted entities/relations
Contributing
This plugin is part of Bub's extensibility model. To extend:
- Custom entity types: Modify Entity.type enum in models.py
- Custom extractors: Replace or wrap extractor.py
- Custom storage: Implement SemanticStore interface
- Custom formatters: Replace _format_snapshots in context.py
All without modifying Bub core.
License
Same as Bub (Apache 2.0)
Questions? See Bub documentation or open an issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bub_semantic_memory-0.1.1.tar.gz.
File metadata
- Download URL: bub_semantic_memory-0.1.1.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ae12ff5c4a8fae6171e897d168d85cabc584c2e87a6bffb680f44bb57fb157c
|
|
| MD5 |
b700af53950b89441104f7bc4ca99978
|
|
| BLAKE2b-256 |
1b5fcafc15a1a2aacf8f01c8db06aab92c05c6aa577fc77ca34226c10a41955e
|
File details
Details for the file bub_semantic_memory-0.1.1-py3-none-any.whl.
File metadata
- Download URL: bub_semantic_memory-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d2102842ddabff42115cf28966f6a1ac386649aa20b5623d55e5c7266f3ad36
|
|
| MD5 |
22d5e77a8e514c38a607c5e342732c40
|
|
| BLAKE2b-256 |
b5a80ea1cc945c1f31e0f70f5e73e1b41401a8d054d34f569870f9e31dc2dc39
|