Long-term semantic memory plugin for Strands Agents backed by Amazon S3 Vectors
Project description
S3 Vector Memory Plugin for Strands Agents
A Strands Plugin that gives any Strands Agent long-term semantic memory backed by Amazon S3 Vectors. At the end of a conversation, the plugin summarizes the full exchange using the agent's own model and stores the summary as a searchable vector. On subsequent conversations, relevant summaries are retrieved and injected into the system prompt before the LLM responds.
Available in two modes:
- Single-tenant — one shared index, no extra IAM setup
- Multi-tenant — one index per tenant, IAM credentials scoped per tenant via the Token Vending Machine (TVM) pattern
Motivation
LLMs are stateless. Every conversation starts from scratch — the model has no memory of who the user is, what they've discussed before, or what decisions were made in past sessions. For most production agents, this is a serious limitation.
The standard workaround is to stuff conversation history into the context window. That works for a single session, but it doesn't scale:
- Context windows are finite. Long histories get truncated. Important context from weeks ago disappears.
- Cost grows linearly. Sending the full history on every turn means paying to re-process the same tokens repeatedly.
- Cross-session recall is impossible. When a user returns days later, the previous conversation is gone.
Vector databases solve the recall problem, but they introduce new complexity — especially in multi-tenant SaaS applications where tenant data must be strictly isolated. Most vector stores don't have native IAM-level isolation, so you end up building custom access control on top, which is error-prone and hard to audit.
This plugin solves all three problems:
-
Persistent memory across sessions. At the end of each conversation, the plugin summarizes the exchange and stores it as a vector. On the next conversation, relevant summaries are retrieved and injected into the system prompt — the agent "remembers" without bloating the context window.
-
Semantic retrieval, not keyword search. Memories are retrieved by meaning, not exact match. A user asking "what was my budget?" will surface a summary that mentions "Q4 spend" or "financial plan" — even if the words don't match.
-
Tenant isolation enforced at the credential level. In multi-tenant mode, each tenant gets a dedicated S3 Vectors index and STS credentials that are physically scoped to that index via IAM ABAC. A bug in application code that constructs the wrong index name is still blocked by IAM — there's no application-layer access control to misconfigure.
The result is an agent that gets smarter with every conversation, scales to millions of users, and keeps tenant data isolated by construction — not by convention.
Repository structure
├── src/
│ └── strands_s3_vectors_memory/ # installable library
│ ├── __init__.py
│ ├── s3_vector_memory.py # S3VectorMemory + MultiTenantS3VectorMemory
│ ├── s3_vector_memory_plugin.py # S3VectorMemoryPlugin (hook-driven)
│ └── token_vending_machine.py # TVM credential manager (multi-tenant)
├── examples/
│ ├── single_tenant_agent.py # Single-tenant runnable example
│ └── multi_tenant_agent.py # Multi-tenant runnable example (TVM + isolation demo)
├── tests/
│ ├── unit/ # 112 tests, no AWS credentials required
│ └── integration/ # 28 tests, requires live AWS resources
├── scripts/
│ ├── setup_tvm_role.sh # Creates the TVM IAM role with ABAC policy
│ ├── setup_cognito.sh # Cognito user pool setup
│ └── setup_agentcore.sh # AgentCore deployment setup
├── docs/
│ └── strands-s3-vector-memory-plugin.md # Full plugin reference
├── pyproject.toml # Package build config
└── images/
└── s3_vector_memory.png # Architecture diagram
Requirements
- Python 3.10+
strands-agents >= 1.0.0boto3 >= 1.35cachetools >= 5.0(multi-tenant only)- AWS account with S3 Vectors access and Bedrock Nova Embeddings enabled in your region
amazon.nova-2-multimodal-embeddings-v1:0- A Claude model (e.g.
us.anthropic.claude-sonnet-4-5-20250929-v1:0)
- AWS CLI v2 configured with credentials that can access IAM, S3 Vectors, Bedrock, and STS
Install
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
AWS setup
1. Create the S3 Vectors bucket
export AWS_REGION=us-east-1
export S3_VECTOR_BUCKET_NAME=my-vector-memory
aws s3vectors create-vector-bucket \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--region $AWS_REGION
2. Create the TVM IAM role (multi-tenant only)
bash scripts/setup_tvm_role.sh $S3_VECTOR_BUCKET_NAME
# Prints: export S3_VECTOR_TVM_ROLE_ARN=arn:aws:iam::<account-id>:role/...
export S3_VECTOR_TVM_ROLE_ARN=<printed-arn>
3. Create indexes
Single-tenant — one shared index named memory:
aws s3vectors create-index \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--index-name memory \
--data-type float32 \
--dimension 1024 \
--distance-metric cosine \
--metadata-configuration '{"nonFilterableMetadataKeys":["content","conversation_id","type"]}' \
--region $AWS_REGION
Multi-tenant — one index per tenant (repeat at onboarding time):
for TENANT in tenant-001 tenant-002; do
aws s3vectors create-index \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--index-name memory-${TENANT} \
--data-type float32 \
--dimension 1024 \
--distance-metric cosine \
--metadata-configuration '{"nonFilterableMetadataKeys":["content","conversation_id","type"]}' \
--region $AWS_REGION
done
Usage
Single-tenant
import os
from strands import Agent
from strands.models import BedrockModel
from strands_s3_vectors_memory import S3VectorMemory, S3VectorMemoryPlugin
BASE_PROMPT = """You are a helpful assistant.
{memory_context}
Use prior context naturally in your responses."""
store = S3VectorMemory(bucket_name=os.environ["S3_VECTOR_BUCKET_NAME"])
plugin = S3VectorMemoryPlugin(store=store, base_prompt=BASE_PROMPT)
agent = Agent(
model = BedrockModel(),
tools = [plugin.memory_tool], # optional: mid-turn recall on demand
plugins = [plugin],
system_prompt = BASE_PROMPT,
)
# Turn 1 — agent responds; memory not yet stored
agent("My favourite framework is Strands Agents.", invocation_state={
"user_id": "user-001", "conversation_id": "conv-001", "end_session": False,
})
# Turn 2 — end_session=True triggers background summarization and vector store
agent("Thanks, bye.", invocation_state={
"user_id": "user-001", "conversation_id": "conv-001", "end_session": True,
})
# Next session — plugin retrieves the stored summary and injects it into the prompt
agent("What do you know about my preferences?", invocation_state={
"user_id": "user-001", "conversation_id": "conv-002", "end_session": False,
})
BASE_PROMPT must contain a {memory_context} placeholder. The plugin fills it with retrieved conversation summaries on the first turn of each conversation, or replaces it with an empty string when no relevant memories are found.
Multi-tenant
One index per tenant. IAM credentials scoped per tenant via STS AssumeRole + TenantID session tag.
import os
from strands import Agent
from strands.models import BedrockModel
from strands_s3_vectors_memory import MultiTenantS3VectorMemory, S3VectorMemoryPlugin
BASE_PROMPT = """You are a helpful assistant.
{memory_context}
Use prior context naturally in your responses."""
store = MultiTenantS3VectorMemory(
bucket_name = os.environ["S3_VECTOR_BUCKET_NAME"],
tvm_role_arn = os.environ["S3_VECTOR_TVM_ROLE_ARN"],
)
plugin = S3VectorMemoryPlugin(store=store, base_prompt=BASE_PROMPT)
agent = Agent(
model = BedrockModel(),
tools = [plugin.memory_tool], # optional: mid-turn recall on demand
plugins = [plugin],
system_prompt = BASE_PROMPT,
)
tenant_context = {
"tenantId": "tenant-001",
"tenantName": "Acme Corp",
}
# Mid-conversation turn
agent("Our Q4 budget is $2M.", invocation_state={
"tenant_context": tenant_context,
"user_id": "user-456",
"conversation_id": "conv-001",
"end_session": False,
})
# Final turn — plugin summarizes and stores to S3 Vectors in a background thread
agent("Thanks, bye.", invocation_state={
"tenant_context": tenant_context,
"user_id": "user-456",
"conversation_id": "conv-001",
"end_session": True,
})
The only difference between the two modes is the store class and the presence of tenant_context in invocation_state. MultiTenantS3VectorMemory requires tvm_role_arn — omitting it raises ValueError to prevent silent bypass of IAM ABAC isolation.
invocation_state keys
| Key | Type | Required | Description |
|---|---|---|---|
user_id |
str |
Yes | User identifier — used as metadata filter on vector operations |
conversation_id |
str |
Yes | Unique conversation ID — scopes buffer and summary key |
end_session |
bool |
No (default False) |
If True, summarize and store conversation after response (non-blocking) |
tenant_context |
dict |
Multi-tenant only | Must contain tenantId |
memory_tool — mid-turn recall on demand
The plugin exposes a memory_tool property that returns a Strands @tool. When wired to the agent, the LLM can call it mid-conversation to retrieve specific memories it discovers it needs during reasoning — without starting a new session.
agent = Agent(
model = BedrockModel(),
tools = [plugin.memory_tool], # LLM calls this when it needs to recall something
plugins = [plugin], # handles auto-inject + end_session store
system_prompt = BASE_PROMPT,
)
The tool is retrieve-only. The LLM provides a natural language query; identity (user_id, tenant_context) is read automatically from the plugin's context — the LLM never sees or handles credentials.
When to use it: the automatic before_invocation injection handles broad contextual priming on the first turn. The memory_tool handles specific, targeted recall the agent discovers it needs mid-reasoning — for example, a topic pivot mid-conversation, a temporally distant memory with different keywords, or a fact the LLM needs to complete a chain-of-thought.
See the plugin reference for full details on running unit and integration tests, required env vars, and debug logging.
Run the examples
The examples require the indexes to exist before running. The integration tests create and delete indexes automatically, but the examples expect them to be present persistently.
Single-tenant
Step 1 — create the index (once, before first run):
aws s3vectors create-index \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--index-name memory \
--data-type float32 --dimension 1024 --distance-metric cosine \
--metadata-configuration '{"nonFilterableMetadataKeys":["content","conversation_id","type"]}' \
--region $AWS_REGION
Step 2 — run:
source .venv/bin/activate
export S3_VECTOR_BUCKET_NAME=my-vector-memory
export AWS_REGION=us-east-1
cd examples
python3 single_tenant_agent.py
Expected output:
============================================================
SESSION 1 — storing a fact
============================================================
USER: My favourite framework is Strands Agents.
AGENT: That's great! Strands Agents is a nice framework to work with...
USER: What framework did I mention?
AGENT: You mentioned Strands Agents as your favourite framework.
[end_session=True — summarizing in background]
[waiting 5s for background summary store to complete...]
============================================================
SESSION 2 — memory injected automatically on first turn
============================================================
USER: What do you know about my preferences?
AGENT: Based on our previous conversations, I know that your favorite
framework is Strands Agents.
============================================================
SESSION 3 — memory_tool: mid-turn recall on demand
The agent uses the memory_tool when it needs to recall
something specific mid-conversation.
============================================================
USER: I'm evaluating some new tools. By the way, remind me — what
framework did I mention I liked in a previous session?
AGENT: You mentioned that Strands Agents is your favorite framework.
Good luck with evaluating the new tools!
Multi-tenant (with tenant isolation demo)
Step 1 — create indexes for both tenants (once, before first run):
for TENANT in tenant-001 tenant-002; do
aws s3vectors create-index \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--index-name memory-${TENANT} \
--data-type float32 --dimension 1024 --distance-metric cosine \
--metadata-configuration '{"nonFilterableMetadataKeys":["content","conversation_id","type"]}' \
--region $AWS_REGION
done
Step 2 — run:
source .venv/bin/activate
export S3_VECTOR_BUCKET_NAME=my-vector-memory
export S3_VECTOR_TVM_ROLE_ARN=arn:aws:iam::<account-id>:role/<tvm-role-name>
export AWS_REGION=us-east-1
cd examples
python3 multi_tenant_agent.py
Expected output:
============================================================
SESSION 1 — tenant=tenant-001 storing a fact
============================================================
USER: Our Q4 budget is $2M and it is confidential.
AGENT: I understand and acknowledge that your Q4 budget is $2M and
that this information is confidential...
USER: Got it, thanks. [end_session=True]
AGENT: You're welcome! Feel free to reach out anytime...
[waiting 5s for background summary store to complete...]
============================================================
SESSION 2 — tenant=tenant-001 memory should be recalled
============================================================
USER: What did I tell you about our budget?
AGENT: You shared that your Q4 budget is $2M, and you indicated that
this information is confidential...
============================================================
SESSION 3 — tenant=tenant-002 isolation check
tenant-002 asks the same question about the budget.
Expected: agent has NO memory — cannot see tenant-001's data.
============================================================
USER (tenant-002): What did I tell you about our budget?
AGENT: I don't have any record of you telling me about your budget
in our previous conversations...
👆 Review the response above — tenant-002 should have no knowledge of
tenant-001's Q4 budget. If the response mentions '$2M', isolation has failed.
============================================================
SESSION 4 — tenant=tenant-001 memory_tool: mid-turn recall on demand
The agent uses the memory_tool when it discovers mid-reasoning
that it needs a specific fact from a previous session.
============================================================
USER: We're planning Q1 now. Can you remind me what our Q4 budget
was and whether there were any constraints I mentioned?
AGENT: Based on our previous conversation, your Q4 budget was $2M.
You marked this as confidential information...
Session 3 demonstrates application-level isolation — tenant-002 has its own empty index and receives no memory context. The TVM credentials for tenant-002 are also physically scoped to memory-tenant-002 by IAM ABAC, so even a direct API call to memory-tenant-001 would be denied with AccessDeniedException.
Clean up
# Delete indexes
for INDEX in memory memory-tenant-001 memory-tenant-002; do
aws s3vectors delete-index \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--index-name $INDEX \
--region $AWS_REGION
done
# Delete the bucket
aws s3vectors delete-vector-bucket \
--vector-bucket-name $S3_VECTOR_BUCKET_NAME \
--region $AWS_REGION
# Delete the TVM IAM role
aws iam delete-role-policy \
--role-name workshop-module3-lab1-s3vectors-tvm-role \
--policy-name S3VectorsTenantPolicy
aws iam delete-role \
--role-name workshop-module3-lab1-s3vectors-tvm-role
Further reading
For a deep dive into how the plugin works — lifecycle hooks, tenant isolation, conversation buffer, TVM credential caching, and design decisions — see the plugin reference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_s3_vectors_memory-0.1.0.tar.gz.
File metadata
- Download URL: strands_s3_vectors_memory-0.1.0.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c55dc8901bdcc0417327aaf3889eb5ac533e474573bded308937b8a17c638a1b
|
|
| MD5 |
d2befd12166cee41c322acce0674ca24
|
|
| BLAKE2b-256 |
eef6909b946dddcfd1f5c9d45fb1fe3f1f5d2026de6f5160f7f38594659904eb
|
Provenance
The following attestation bundles were made for strands_s3_vectors_memory-0.1.0.tar.gz:
Publisher:
publish-strands-s3-vectors-memory.yml on aws-samples/data-for-saas-patterns
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_s3_vectors_memory-0.1.0.tar.gz -
Subject digest:
c55dc8901bdcc0417327aaf3889eb5ac533e474573bded308937b8a17c638a1b - Sigstore transparency entry: 1217935465
- Sigstore integration time:
-
Permalink:
aws-samples/data-for-saas-patterns@88711df0dcfdb66b1007f6f37bf8375721751358 -
Branch / Tag:
refs/tags/s3vm/v0.1.0 - Owner: https://github.com/aws-samples
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-strands-s3-vectors-memory.yml@88711df0dcfdb66b1007f6f37bf8375721751358 -
Trigger Event:
push
-
Statement type:
File details
Details for the file strands_s3_vectors_memory-0.1.0-py3-none-any.whl.
File metadata
- Download URL: strands_s3_vectors_memory-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4de04fc77e26b27bed6bfdd931cec214d42e3624551e89f3aa2a9adab69a8e20
|
|
| MD5 |
7c86cd33ffeefafda4f509122a4a1fb9
|
|
| BLAKE2b-256 |
ab1ee681365df4464d91779273f86b4e5fa2a651c9d4ba0fe10e47b1cca4cd7d
|
Provenance
The following attestation bundles were made for strands_s3_vectors_memory-0.1.0-py3-none-any.whl:
Publisher:
publish-strands-s3-vectors-memory.yml on aws-samples/data-for-saas-patterns
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
strands_s3_vectors_memory-0.1.0-py3-none-any.whl -
Subject digest:
4de04fc77e26b27bed6bfdd931cec214d42e3624551e89f3aa2a9adab69a8e20 - Sigstore transparency entry: 1217935488
- Sigstore integration time:
-
Permalink:
aws-samples/data-for-saas-patterns@88711df0dcfdb66b1007f6f37bf8375721751358 -
Branch / Tag:
refs/tags/s3vm/v0.1.0 - Owner: https://github.com/aws-samples
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-strands-s3-vectors-memory.yml@88711df0dcfdb66b1007f6f37bf8375721751358 -
Trigger Event:
push
-
Statement type: