Skip to main content

Semantic memory plugin for Bub

Project description

Semantic Memory Plugin for Bub

A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.

Overview

This plugin intercepts the tape context building process to:

  1. Extract semantics from conversation entries using an LLM
  2. Store snapshots of entities (people, tasks, concepts) and relations between them
  3. Inject memory into subsequent agent prompts, enabling long-context awareness

The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing build_tape_context architecture without modifying core.

Installation

The plugin is already registered in pyproject.toml:

[project.entry-points."bub"]
semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"

Bub's framework automatically loads and instantiates it on startup. No additional setup required.

How It Works

Per-Turn Flow

  1. Input: Agent receives a new message, tape entries are loaded
  2. Extract: LLM analyzes entries and identifies:
    • Entities: people, tasks, events, concepts
    • Relations: created, depends_on, mentions, etc.
  3. Store: SemanticSnapshot is appended to ~/.bub/tapes/semantic/{tape_id}.jsonl
  4. Load: All historical snapshots for this tape are loaded
  5. Inject: Semantic memory is formatted as a system prompt block and prepended to the context
  6. Output: Agent receives enriched context with semantic awareness

Example

Given this conversation:

User: "Alice created a task to deploy v1.0"
Agent: [responds]
User: "What did Alice do?"

On the second turn, the agent sees:

## Semantic Memory

### Entities (2):
- person:alice
- task:deploy_v1 (v1.0 deployment)

### Relations (1):
- alice --created--> deploy_v1

---

[rest of context]

Architecture

Core Modules

  • models.py: Pydantic dataclasses for Entity, Relation, SemanticSnapshot
  • extractor.py: LLM-based extraction from tape entries
  • store.py: JSONL file storage at ~/.bub/tapes/semantic/
  • context.py: Formatting snapshots into system prompts
  • hook_impl.py: Bub hookimpl that wires everything together

Storage Format

Snapshots are stored as JSONL (one JSON object per line):

{
  "entities": [
    {"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
    {"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
  ],
  "relations": [
    {"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
  ],
  "tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
  "anchor_id": "anchor_001",
  "created_at": "2026-06-06T09:35:00Z"
}

Configuration

The plugin reuses your main LLM settings (BUB_MODEL, BUB_API_KEY, etc.):

# Your existing setup (e.g., DeepSeek)
export BUB_MODEL=deepseek:deepseek-chat
export BUB_API_KEY=sk-...
uv run bub chat

No separate BUB_SEMANTIC_* variables needed. Semantic extraction uses the same model as your agent.

Testing

Run the test suite:

uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v

Coverage: 43 tests across unit and integration scenarios:

  • Entity/Relation serialization
  • JSONL storage I/O
  • LLM extraction with mocks
  • Context building
  • Multi-turn memory retention

Usage Examples

Example 1: CLI Multi-Turn

$ uv run bub chat
bub > Alice is a data scientist.
Agent > Got it.

bub > What is Alice's profession?
Agent > Alice is a data scientist. (retrieved from semantic memory)

bub > ,tape.info
[Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]

Example 2: Telegram

You: "I need to fix a critical bug in the payment module"
Bot: [Uses semantic memory to track bug, module]

You: "What was I working on?"
Bot: [Recalls semantic memory: bug:critical_payment, module:payment]

Example 3: Inspect Semantic Store

$ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
[Shows stored entities and relations]

Performance & Cost

Token Usage

  • Each extraction call: ~300-500 tokens (depends on entry volume)
  • Estimated overhead: +10-20% per turn (configurable via extraction prompt)

Storage

  • JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
  • Typical session: ~50-100 KB

Latency

  • Extraction is async, non-blocking
  • First turn (with extraction): ~500ms extra
  • Subsequent turns: ~50ms extra (just loading snapshots)

Graceful Degradation

If semantic extraction fails for any reason:

  • LLM error: Returns empty snapshot, continues
  • Invalid JSON: Logged as warning, continues
  • Storage error: Logged, continues with base context

The agent always works, semantic memory is optional enhancement.

Future Enhancements

Phase 2: Smart Retrieval

  • Vector embeddings for semantic similarity search
  • Retrieval-augmented context injection (only include relevant entities)
  • Reduces prompt bloat for long sessions

Phase 3: Advanced Graphs

  • Entity dependency analysis (who depends on what)
  • Centrality metrics (who/what is most important)
  • Causal reasoning (what led to what)

Phase 4: Multi-Session Memory

  • Cross-session entity resolution
  • Long-term memory across multiple conversations
  • Persistent entity graph (not just per-tape)

Troubleshooting

Q: Plugin not loading? A: Check that entry-point is registered:

python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"

Q: Semantic snapshots not appearing? A: Check ~/.bub/tapes/semantic/ directory exists. Check logs with BUB_VERBOSE=1.

Q: LLM calls are expensive? A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.

API Reference

build_semantic_context(entries, context, llm=None, store=None) → list[dict]

Build context with semantic memory. Called by the framework automatically.

Args:

  • entries: Iterable of TapeEntry objects
  • context: TapeContext instance
  • llm: republic.LLM instance (optional; if None, returns base context)
  • store: SemanticStore instance (optional; if None, returns base context)

Returns: List of message dicts ready for model input

extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot

Extract entities and relations from tape entries.

Args:

  • entries: List of TapeEntry objects
  • llm: republic.LLM instance for extraction
  • tape_id: Session/tape identifier
  • anchor_id: Optional anchor point identifier
  • max_tokens: Max tokens for LLM response

Returns: SemanticSnapshot with extracted entities/relations

Contributing

This plugin is part of Bub's extensibility model. To extend:

  1. Custom entity types: Modify Entity.type enum in models.py
  2. Custom extractors: Replace or wrap extractor.py
  3. Custom storage: Implement SemanticStore interface
  4. Custom formatters: Replace _format_snapshots in context.py

All without modifying Bub core.

License

Same as Bub (Apache 2.0)


Questions? See Bub documentation or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bub_semantic_memory-0.1.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bub_semantic_memory-0.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file bub_semantic_memory-0.1.0.tar.gz.

File metadata

  • Download URL: bub_semantic_memory-0.1.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bub_semantic_memory-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba849f65332d415195a130396d0011319c92ca4302355fdbd929517728fb39e2
MD5 9eefbeaa8414d5a1bcc91e1a46645df6
BLAKE2b-256 c8fbff0fd4f7f78b0bdaec967b8574c0d2f0949123502b63821f8cef28a0c350

See more details on using hashes here.

File details

Details for the file bub_semantic_memory-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bub_semantic_memory-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bub_semantic_memory-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5645ee6dbd21d3eedf0a0632f8942b0696f58f3eec55d0f9e13da7d7602967c1
MD5 bda49908305f7161566a44dd11a132fe
BLAKE2b-256 7deda6f097a986a93db2807c1e30f95683ed55b0ef92cff9ac3e9aaeed6f7dcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page