Complete agent memory: reasoning queries + vector search + auto-extraction. Decision intelligence for LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, Pydantic AI, smolagents, LlamaIndex, Haystack, and CAMEL-AI.
Project description
flowscript-agents
Agent memory that tracks why you decided, what conflicts, and what's blocked. Not just what was said.
Plain text in. Typed reasoning queries out:
from openai import OpenAI
from flowscript_agents import UnifiedMemory
from flowscript_agents.embeddings import OpenAIEmbeddings
client = OpenAI()
llm = lambda prompt: (client.chat.completions.create(
model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}]
).choices[0].message.content or "")
with UnifiedMemory("agent-memory.json", embedder=OpenAIEmbeddings(), llm=llm) as mem:
mem.add("Redis gives sub-ms reads which is critical for our UX requirements")
mem.add("Redis clustering costs $200/month which exceeds our infrastructure budget of $50/month")
mem.add("PostgreSQL gives us rich queries at $15/month but read latency is 10-50ms")
tensions = mem.memory.query.tensions()
# → TensionsResult(1 tension, axes=['cost vs budget'])
# The LLM detected the $200/month vs $50/month contradiction
# and preserved both sides as a queryable tension
blocked = mem.memory.query.blocked()
# → BlockedResult(0 blockers)
why = mem.memory.query.why(node_id)
# → CausalAncestry: full chain backward from any node
Five queries that no vector store can answer — why(), tensions(), blocked(), alternatives(), whatIf() — over a typed semantic graph. Drop-in adapters for 9 agent frameworks. Hash-chained audit trail. And when memories contradict, we don't delete the old one — we create a queryable tension.
Why FlowScript
Agent memory stores what happened. FlowScript stores why.
Most agent infrastructure is converging on authorization — identity, access control, audit trails for who did what. That's necessary. But it leaves a gap: your agent can prove it was allowed to make a decision, but not why it made it. Researchers call this "strategic blindness" — memory that tracks content without tracking reasoning.
FlowScript sits above your memory store, not instead of it. Google Memory Bank, LangGraph checkpointers, Mem0 — they remember what your agent stored. FlowScript remembers why it decided, what it traded off, and what breaks if you change your mind.
Get Started
MCP Server (Claude Code / Cursor — zero code)
pip install flowscript-agents openai
The openai package is required for extraction, consolidation, and vector search. Without it, add_memory stores raw text and query_tensions won't find anything.
Add to your editor's MCP config:
Claude Code — add to .claude/settings.json in your project (or ~/.claude/settings.json for global):
{
"mcpServers": {
"flowscript": {
"command": "flowscript-mcp",
"args": ["--memory", "./project-memory.json"],
"env": {
"OPENAI_API_KEY": "your-key"
}
}
}
}
Cursor / Windsurf / VS Code — add to .mcp.json in your project root:
{
"mcpServers": {
"flowscript": {
"type": "stdio",
"command": "flowscript-mcp",
"args": ["--memory", "./project-memory.json"],
"env": {
"OPENAI_API_KEY": "your-key"
}
}
}
}
Fallback: If env passthrough doesn't work in your editor, export the key in your shell before launching:
export OPENAI_API_KEY=your-key
The server auto-detects your API key and configures the full stack:
| Key | What you get |
|---|---|
OPENAI_API_KEY |
Vector search (text-embedding-3-small) + typed extraction (gpt-4o-mini) + consolidation |
ANTHROPIC_API_KEY |
Typed extraction + consolidation (no embeddings, keyword search fallback) |
| Neither | Raw text storage only. Tools work, but no typed extraction and query_tensions won't find anything. |
Without an API key, you get a degraded experience. The server warns on startup and in tool responses.
Embedding Providers
The default is OpenAI text-embedding-3-small. To use a different provider, pass flags in args:
"args": ["--memory", "./project-memory.json", "--embedder", "ollama", "--embedding-model", "nomic-embed-text"]
| Flag | What it does | Default |
|---|---|---|
--embedder |
Embedding provider: openai, sentence-transformers, or ollama |
Auto-detected from API key |
--embedding-model |
Model name (provider-specific) | text-embedding-3-small (OpenAI) |
--llm-model |
LLM for extraction and consolidation | gpt-4o-mini |
--no-auto |
Disable auto-configuration from API keys | Off |
Local embeddings (free, no API key for embeddings):
| Provider | Install | Example model | Notes |
|---|---|---|---|
| Ollama | Install Ollama, then ollama pull nomic-embed-text |
nomic-embed-text |
Beats text-embedding-3-small. 274MB. |
| SentenceTransformers | pip install sentence-transformers |
BAAI/bge-m3 |
Runs on CPU. Downloads on first use. |
You still need an LLM API key (OPENAI_API_KEY or ANTHROPIC_API_KEY) for typed extraction and consolidation, even when using local embeddings.
Using Anthropic instead of OpenAI:
With ANTHROPIC_API_KEY set, the server auto-configures extraction and consolidation using Claude Haiku. No vector search (Anthropic has no embedding API), but keyword + temporal search works well. To use a different Anthropic model:
"args": ["--memory", "./project-memory.json", "--llm-model", "claude-sonnet-4-6"]
Then add the CLAUDE.md snippet to your project. This is what turns tools into a workflow. It tells your agent when to record decisions, surface tensions before new choices, and check blockers at session start. Without it, the tools are available but passive. With it, your agent proactively tracks your project's reasoning.
Python SDK
pip install flowscript-agents # Core
pip install flowscript-agents[langgraph] # + LangGraph adapter
pip install flowscript-agents[crewai] # + CrewAI adapter
pip install flowscript-agents[all] # Everything (9 frameworks)
Bracket syntax matters — it installs framework-specific dependencies.
How It Works
FlowScript operates at three levels. Pick where you start:
Level 1 — Reasoning graph, no API keys. Use the Memory class directly to build typed nodes (thoughts, questions, decisions) with explicit relationships (causes, tensions, alternatives). Sub-ms queries, zero external deps. This is the power-user API. Full docs →
Level 2 — Add vector search. Pass an embedder to UnifiedMemory for semantic similarity search alongside reasoning queries. Three providers: OpenAI, SentenceTransformers, Ollama. Details →
Level 3 — Full stack. Add an llm for auto-extraction (plain text → typed nodes) and a consolidation_provider for contradiction handling. Or just use the MCP server, which auto-configures all of this from a single API key.
First 5 Minutes
With the MCP server running and the CLAUDE.md snippet in your project, try this conversation:
"I need to decide between PostgreSQL and MongoDB for our user data. We need ACID compliance for payments but flexibility for user profiles."
Your agent stores the decision context, tradeoffs, and rationale automatically. Now introduce contradictory information:
"Actually, I've been looking at DynamoDB. The scale requirements might matter more than I thought."
Now ask:
"What tensions do we have in our architecture decisions?"
FlowScript preserved both perspectives (PostgreSQL's ACID compliance vs DynamoDB's scalability) as a queryable tension instead of deleting the first decision. That's what RELATE > DELETE means in practice.
After a few sessions, try:
- "What's blocking our progress?" surfaces blockers and their downstream impact
- "Why did we choose PostgreSQL originally?" traces the full causal chain
- "What if we switch to DynamoDB?" maps the downstream consequences
After 20 sessions, you have a curated knowledge base of your project's decisions, not a pile of notes. Knowledge that stays relevant graduates through temporal tiers. One-off observations fade naturally.
Works With Your Stack
Drop-in adapters that implement your framework's native interface. Same API you already use — plus query.tensions().
from flowscript_agents.langgraph import FlowScriptStore
with FlowScriptStore("agent-memory.json") as store:
# Standard LangGraph BaseStore operations
store.put(("agents", "planner"), "db_decision", {"value": "chose Redis for speed"})
items = store.search(("agents", "planner"), query="Redis")
# What's new — typed reasoning queries on the same data
tensions = store.memory.query.tensions()
blockers = store.memory.query.blocked()
# Resolve a store key to its full reasoning context
node = store.resolve(("agents", "planner"), "db_decision")
| Framework | Adapter | Install |
|---|---|---|
| LangGraph | FlowScriptStore → BaseStore |
[langgraph] |
| CrewAI | FlowScriptStorage → StorageBackend |
[crewai] |
| Google ADK | FlowScriptMemoryService → BaseMemoryService |
[google-adk] |
| OpenAI Agents | FlowScriptSession → Session |
[openai-agents] |
| Pydantic AI | FlowScriptDeps → Deps + tools |
[pydantic-ai] |
| smolagents | FlowScriptMemory → Tool protocol |
[smolagents] |
| LlamaIndex | FlowScriptMemoryBlock → BaseMemoryBlock |
[llamaindex] |
| Haystack | FlowScriptMemoryStore → MemoryStore |
[haystack] |
| CAMEL-AI | FlowScriptCamelMemory → AgentMemory |
[camel-ai] |
All adapters expose .memory for query access, support with blocks, and accept optional embedder/llm/consolidation_provider for vector search and extraction. Per-framework examples →
When Memories Contradict
Every other memory system handles contradictions by deleting. Mem0's consolidation uses ADD/UPDATE/DELETE/NONE — when facts contradict, the old memory is replaced. LangGraph's langmem does the same. CrewAI's consolidation is flat keep/update/delete.
FlowScript doesn't delete. It relates.
When consolidation detects a contradiction, it creates a RELATE — a tension with a named axis. Both memories survive. The disagreement itself becomes queryable knowledge.
| Action | What happens |
|---|---|
ADD |
New knowledge, no existing match |
UPDATE |
Enriches existing node with new detail |
RELATE |
Contradiction detected — both sides preserved as a queryable tension |
RESOLVE |
Blocker condition changed — downstream decisions unblocked |
SKIP |
Exact duplicate, no action |
You can't audit a deletion. You can query a tension.
Audit Trail
Every mutation is SHA-256 hash-chained, append-only, crash-safe. Verify the full chain in one call:
from flowscript_agents import Memory, MemoryOptions, AuditConfig
mem = Memory.load_or_create("agent.json",
options=MemoryOptions(audit=AuditConfig(retention_months=84)))
# ... agent does work ...
result = Memory.verify_audit("agent.audit.jsonl")
# → AuditVerifyResult(valid=True, total_entries=42, files_verified=1)
Framework attribution is automatic — every audit entry records which adapter triggered it. Query by time range, event type, adapter, or session. Rotation with gzip compression. on_event callback for SIEM integration. Full audit trail docs →
Session Lifecycle — How Memory Gets Smarter
Just like a mind needs sleep to consolidate memories, your agent's reasoning graph needs regular session wraps to develop intelligence over time. Without consolidation cycles, knowledge accumulates as noise instead of maturing.
Temporal tiers — nodes graduate based on actual use:
| Tier | Meaning | Behavior |
|---|---|---|
current |
Recent observations | May be pruned if not reinforced |
developing |
Emerging patterns (2+ touches) | Building confidence |
proven |
Validated through use (3+ touches) | Protected from pruning |
foundation |
Core truths | Always preserved |
Every query touches returned nodes — knowledge that keeps getting queried earns its place. One-off observations fade naturally. Dormant nodes are pruned to the audit trail — archived with full provenance, never destroyed.
Three ways session wraps happen:
- Explicit — the LLM calls the
session_wraptool when you say "let's wrap up" (best results) - Auto-wrap — after 5 minutes of inactivity, the MCP server auto-consolidates (safety net, configurable via
FLOWSCRIPT_AUTO_WRAP_MINUTES, set to0to disable) - Process exit — when the MCP server shuts down, a final consolidation runs automatically
For SDK users — adapters support context managers that auto-wrap:
from flowscript_agents.adapters.langgraph import FlowScriptStore
with FlowScriptStore("agent-memory.json") as store:
# work happens — all mutations auto-save
store.put(("agents",), "key", {"value": "data"})
# close() fires automatically → session_wrap() + save
After 20 sessions, your memory is a curated knowledge base, not a pile of notes. Full lifecycle details →
Description Integrity
MCP tool descriptions are the prompts your LLM reads. If they're mutated in-process, the LLM silently follows poisoned instructions. The FlowScript MCP server includes three-layer integrity verification — a reference implementation of deterministic description integrity for MCP:
verify_integritytool — LLM-callable. SHA-256 hashes of all tool definitions, deep-frozen at startup (MappingProxyType). Detects in-process mutation by malicious dependencies, monkey-patching, or middleware.flowscript://integrity/manifestresource — Host-verifiable. Claude Code / Cursor can verify descriptions without LLM involvement.tool-integrity.json— Build-time root of trust. Generated viaflowscript-mcp --generate-manifest, ships in the package.
Both the Python and TypeScript MCP servers implement this architecture. Honest threat model: detects in-process mutation, not supply chain or transport-layer attacks. Full discussion →
Comparison
| FlowScript | Mem0 | Vector stores | |
|---|---|---|---|
| Find similar content | Vector search | Vector search | Vector search |
| "Why did we decide X?" | why() — typed causal chain |
— | — |
| "What's blocking?" | blocked() — downstream impact |
— | — |
| "What tradeoffs?" | tensions() — named axes |
— | — |
| "What if we change this?" | whatIf() — impact analysis |
— | — |
| Contradictions | RELATE — both sides preserved |
DELETE — replaced |
N/A |
| Audit trail | SHA-256 hash chain | — | — |
| Temporal graduation | Automatic 4-tier | — | — |
| Token budgeting | 4 strategies | — | — |
Under the hood: a local semantic graph with typed nodes, typed relationships, and typed states. Queries traverse structure — no embeddings required, no LLM calls, no network. Sub-ms on project-scale graphs. Vector search and reasoning queries are orthogonal — use both.
Ecosystem
| Package | What | Install |
|---|---|---|
| flowscript-agents | Python SDK — 9 adapters, unified memory, consolidation, audit trail | pip install flowscript-agents openai |
| flowscript-core | TypeScript SDK — Memory class, 15 tools, token budgeting, audit trail | npm install flowscript-core |
| flowscript.org | Web editor, D3 visualization, live query panel | Browser |
1,315 tests across Python (584) and TypeScript (731). Same audit trail format and canonical JSON serialization across both languages.
Docs
- API Reference — Memory, UnifiedMemory, AuditConfig, queries
- Framework Adapters — per-framework examples and integration guides
- Audit Trail — configuration, SIEM integration, compliance
- Session Lifecycle — temporal tiers, persistence, multi-session patterns
MIT. Built by Phillip Clapham.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flowscript_agents-0.2.6.tar.gz.
File metadata
- Download URL: flowscript_agents-0.2.6.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccb4665b263be77a0505bdb330ed2c056980908d56e3df760ee441f85a7059bb
|
|
| MD5 |
af452845e05e57a35b162e9af2325780
|
|
| BLAKE2b-256 |
a94ead4a3699702d4fa0f19fb8dbcb6b882e70bc91e8ecae477f4d2ebc627322
|
File details
Details for the file flowscript_agents-0.2.6-py3-none-any.whl.
File metadata
- Download URL: flowscript_agents-0.2.6-py3-none-any.whl
- Upload date:
- Size: 134.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
658731ed82f0ad2e4eeb8905c800ee0afa3e6d8c60ccb2821a9b50fa20191831
|
|
| MD5 |
6a8d20b1b9f5bd48548f21a82e4fd5ff
|
|
| BLAKE2b-256 |
1e58b0eb2f8be0f61c6aa12204632088ae146b669aa1c1d5d8a6d52641028423
|