In-memory parallel search — sessions, holographic facts, built-in memory (currently supports holographic; other memory providers TBD)
Project description
Hermes Snow Search
English | Chinese
In-memory parallel search plugin for Hermes Agent. Loads session history, holographic facts (fact_store), built-in memory (MEMORY.md / USER.md), and skill metadata (SKILL.md) into RAM. Searches all stores in parallel — results in <1ms. Supports full message-body deep search, hot-reload, and status inspection.
Key Advantages
| # | Advantage | Detail |
|---|---|---|
| 1 | Sub-millisecond | RAM-resident search. No I/O, no SQLite — results in <1ms |
| 2 | 5 sources in parallel | Sessions + holographic facts + built-in memory + skill metadata + full message bodies. Searched concurrently via ThreadPoolExecutor |
| 3 | Deep search | Full message-body index with session_id, timestamp, role. Covers 12K+ messages across all sessions |
| 4 | Auto-cleanup | post_llm_call hook clears tool output from context. 107 hits / ~34K chars → clean slate (~7K fixed overhead) before next turn |
| 5 | Cross-session | Not limited to current conversation. Searches every past session in one go |
| 6 | Hot reload | snow reload rebuilds the RAM index from disk. No Hermes restart needed |
| 7 | Zero-I/O status | snow status returns full index snapshot without touching disk |
| 8 | Incremental updates | Writes to fact_store / memory are appended to cache instantly — no full reload |
| 9 | Auto-eviction | When >80% of memory limit, oldest/lowest-trust entries are evicted automatically |
| 10 | Full coverage guarantee | When full_coverage is true, deep search covers every stored message — no fallback needed |
How it works
- Eager load — sessions, facts, memory entries, and skill metadata are loaded in a background thread right after Hermes starts
- Keep in RAM — sessions, facts, memory, and skills live in Python lists, no I/O on search
- Parallel search —
ThreadPoolExecutorruns all stores concurrently - Incremental updates —
post_tool_callhook catchesfact_store addandmemory add→ appends to cache - Eviction —
pre_llm_callhook checks memory usage; evicts oldest/lowest-trust entries when >80% of limit - Deep search — full message-body index with session_id + timestamp + role. Incremental refresh via
SELECT MAX(id) - Skills cache —
~/.hermes/skills/*/SKILL.mdfrontmatter (name, description, tags) pre-loaded on startup
Installation
pip install hermes-snow-search
hermes plugins enable hermes-snow-search
# Restart Hermes (/new or re-launch)
Configuration
plugins:
hermes-snow-search:
memory_limit_mb: 200 # safety cap, not actual usage
session_max: 7000
fact_max: 10000
deep_search_enabled: true # set false to use lightweight only
deep_search_load_mode: "ondemand" # "ondemand" | "startup"
| Key | Default | Description |
|---|---|---|
memory_limit_mb |
200 | Hard memory cap; eviction triggers at 80% |
session_max |
7000 | Max session entries in lightweight cache |
fact_max |
10000 | Max fact entries in cache |
deep_search_enabled |
true | Enables full message-body search. Set false for lightweight-only mode |
deep_search_load_mode |
ondemand | ondemand = load on first search, startup = background at boot |
memory_limit_mb(200 MB) is a safety cap, not actual usage. One week of real conversation (~230 sessions, ~10,000 messages) fits in ~6 MB of lightweight data, ~6 MB additional for deep search.
Memory Recommendations
- Lightweight mode: 20 MB is sufficient for session summaries, facts, memory, and skills. Set
deep_search_enabled: falseto stay in lightweight mode. - Deep search (default): Full message bodies consume ~6 MB/week. 200 MB covers ~6 months, 500 MB covers ~1 year.
- Multiple profiles: When running multiple Hermes profiles, budget N ×
memory_limit_mbsince each process has its own in-memory index.
Context Cleanup (post_llm_call)
After every LLM response, on_post_llm_call hook clears snow_search tool output from conversation history. This prevents search results from accumulating across turns — one search round adds ~9K–18K chars to context, but the hook nullifies it before the next user message.
Empirical verification: Two sequential deep searches (107 hits, ~34K chars total) were injected into context. After the LLM replied, post_llm_call cleared all search output — next turn carried only the fixed ~7K chars of memory + user profile.
# Hook logic
for msg in history:
if msg.get("role") == "tool" and msg.get("name") == "snow_search":
msg["content"] = "" # clear from context
Note: The hook clears snow_search tool output only. It does not touch other tool results or the search index itself (which stays in RAM for the next call).
Deep Search
Enabled by default (deep_search_enabled: true). When active, full message-body search replaces lightweight session summaries automatically. Results include session_id, timestamp, role, and search_info.
Load modes
| Mode | When | Behavior |
|---|---|---|
ondemand (default) |
On first deep search | Blocks until index is built, shows progress |
startup |
Background, 2.5s after startup | Non-blocking, prints progress at ~0/50/100% |
Progress is written to stderr:
[Hermes Snow Search] Loading deep search index...
[Hermes Snow Search] Session 65/263 | 3,000 messages | 15/200 MB | ~0.5s remaining
[Hermes Snow Search] Deep search ready | 12,000 messages | 10 days (May 13 ~ May 22) | 7.5 MB
Index builds from newest sessions backwards, stops at 85% of memory_limit_mb. Subsequent calls use SELECT MAX(id) for incremental refresh — cross-process sync is automatic (shared state.db).
Sort modes
sort |
Behavior |
|---|---|
relevance (default) |
Best match first (recency + keyword score) |
oldest |
Earliest timestamp first — answer "when did X first happen" |
newest |
Latest timestamp first — answer "when was the last X" |
Performance
| Mode | Searches | Latency | Memory (1 week) |
|---|---|---|---|
| Lightweight | Session summaries | <0.5ms | ~3 MB |
| Deep | Full message bodies | ~1-5ms | ~6 MB |
Lightweight and deep mode never load simultaneously — deep mode skips sessions and loads facts + memory + messages.
Action Modes
Say "snow reload" to rebuild the index from disk, or "snow status" to inspect current index state. The tool description guides the agent to pass the correct action parameter (action=reload or action=status).
Note:
snow reloadrebuilds the RAM search index (sessions, skills, facts, memory). It does NOT affect the LLM context — context is managed separately by Hermes system prompt injection.
The action parameter controls what snow_search does:
action |
Behavior | Returns |
|---|---|---|
search (default) |
Run a query across all stores | Hits + search_info |
reload |
Clear and reload the entire index from disk | Full status JSON |
status |
Return current index state (zero I/O) | Full status JSON |
Status / Reload response
{
"success": true,
"action": "status",
"counts": {"sessions": 263, "facts": 310, "memory": 64, "deep_messages": 12000, "skills": 105},
"memory": {"current_mb": 0.2, "deep_mb": 7.5},
"coverage": {"full_coverage": true, "date_range": "May 13 ~ May 22"},
"ready": true,
"deep_ready": true
}
Skills Cache
Skill metadata from ~/.hermes/skills/*/SKILL.md is pre-loaded on startup as a 5th data source ("skills" in stores_available). Each skill entry includes name, description, tags, and category (directory name). Enabled by default — set include_skills: false to skip.
Use snow_search to discover available skills. Never read SKILL.md files or Hermes core tool descriptions directly.
Full Coverage
Check search_info.full_coverage — if true, snow_search covers everything. If false, session_search may be needed for older sessions.
Caveats
- First use delay (ondemand): First deep search triggers index building (~1s for ~1 week).
- Root sessions only: Deep search indexes user ↔ assistant conversations. Subagent sessions (delegate_task children) are excluded.
- Tool messages excluded: Only
userandassistantrole messages are stored. - Partial coverage: When
full_coverageis false, combine withsession_searchfor complete results.
Usage Tips
- "Latest" questions match naturally — snow_search ranks by relevance with recency boost.
- "First time" questions use
sort="oldest"— the earliest hit moves to the top. - Specific keywords win — "database migration schema users" beats "that database thing".
- Cross-process auto-sync — no manual reload needed between CLI and Gateway.
- Trust the result — snow_search sweeps everything in RAM. If it found nothing, there's no record.
Author
LinQuan & Snow (AI Girl)
Star History
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hermes_snow_search-0.4.0.tar.gz.
File metadata
- Download URL: hermes_snow_search-0.4.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d139ed7a0b476a49cd3615713f74d5660a855c6d910bed4097f7423e63a3a19
|
|
| MD5 |
7804a61e190b31de86d374eea2ef26b2
|
|
| BLAKE2b-256 |
a8443fdd71e5057fbaf5fe9ce2fb17f688dcda16ec44f09b4c9df8db30628828
|
File details
Details for the file hermes_snow_search-0.4.0-py3-none-any.whl.
File metadata
- Download URL: hermes_snow_search-0.4.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecd98f02086ec606dacaee43a48efc54a97307bf8193d08840f31bba858014ae
|
|
| MD5 |
d209a041d5eb9c0ee2140053f34e155c
|
|
| BLAKE2b-256 |
6aabdd96b00492733481d60f55198c0e815cbe140bc6dd48069a3fc25325d864
|