MyPalace — standalone memory service for AI assistants. Full HTTP + gRPC surface, multi-tenancy, graph, cache, workers, rate limits, websockets, observability, analytics, bulk import/export, TTL, embedding migration, audit log, memory versioning.
Project description
MyPalace Memory Service
A standalone, lightweight memory service for AI assistants. Stores facts, preferences, and conversation history; serves them back via semantic search and LLM-ready context blocks.
Extracted from mypalclara's Palace memory system as an independent microservice with no mypalclara dependency.
What it does
- Memory CRUD + semantic search — embed text, store in Qdrant, retrieve by similarity
- Session/message persistence — conversation threads with PostgreSQL
- Context assembly — combine relevant memories + recent messages into a single payload for LLM prompts
- Pluggable embeddings — HuggingFace (local) or OpenAI (API)
- Pluggable LLM backend — any OpenAI-compatible chat completion endpoint (OpenRouter, OpenAI, etc.)
Project status — v0.10.0
Released to PyPI as mypalace (server) and mypalace-client. Production-ready
in scope; see the phase notes below for what's in and what's deliberately left
out.
Capabilities (built across phases 1–9):
-
Phase 1 — memory CRUD + semantic search, sessions, context assembly, pluggable embeddings (HuggingFace / OpenAI) and LLM backend.
-
Phase 2 — episodes + LLM reflection, narrative arcs, FSRS-6 dynamics, intentions with 4 trigger matchers, layered context, smart ingestion (LLM extract + dedup + auto-supersede), manual supersede + audit history.
-
Phase 3 — API-key auth with read/write/admin scopes, full multi-tenancy (per-tenant Qdrant collections, key-bound + cross-tenant admin keys), optional FalkorDB graph layer with async writes + neighbors endpoint, optional Redis read-through cache, gRPC
MemoryServicetransport, PyPI/Docker release pipeline. -
Phase 4 — Alembic migrations with auto-stamp on first boot, Prometheus
/metrics+ OpenTelemetry traces + structlog, Postgres-backed worker queue withSELECT … FOR UPDATE SKIP LOCKED, per-(tenant, key, user) sliding- window rate limits, WebSocket event subscriptions, graph-walkedl3_graph_contextin layered retrieval. -
Phase 5 — Worker-queue routing for async reflection/synthesis, episode/ intention/arc event publishers, full gRPC mirror of remaining surfaces (22 RPCs / 8 services), cross-tenant admin analytics endpoint.
-
Phase 6 — Bulk
/v1/admin/export+/v1/admin/import(NDJSON) for DR and tenant migration, memory TTL with worker-driven cleanup, embedding-model migration via/v1/admin/reembed, release-pipeline fixes. -
Phase 7 — Admin operation audit log, append-only memory change history with
/v1/memories/{id}/history, cross-tenant search (POST /v1/memories/search?tenant_id=ALL), mypalclara migration guide. -
Phase 8 — production hardening: deep
/health/deepthat pings each backend, boot-time config validation that refuses to start on bad env, DB-query observability via SQLAlchemy event hooks, production docker-compose + deployment guide. -
Phase 9 — operator UX:
mypalace-adminCLI for day-to-day ops, proper k8s/livevs/readysplit (so backend blips no longer trigger pod restarts), tunable SQLAlchemy connection pool withpool_pre_pingon by default, scheduled per-tenant backup worker (gzipped NDJSON, atomic publish, mtime-based pruning). -
Phase 10 — mypalclara parity: entity resolver (platform-id → human name registry), personality evolution (LLM-driven self-evolving traits, run via the worker queue), token-based context budget env vars, optional Redis embedding cache with toggle, and VCH (verbatim chat history search via Postgres FTS). Driven by
docs/gap-analysis-mypalclara.md.
Deliberately out of scope (operators who need them should fork or deploy separately): per-tenant Postgres schemas, admin web UI, memory clustering / topic discovery, fine-grained per-key tenant-resource scoping.
Auth (phase 3 slice 1)
Every /v1/* endpoint requires a valid API key in the X-Palace-Key header. /health, /docs, /redoc, /openapi.json remain public.
Three scopes:
read—GET /v1/*,POST /v1/memories/search,/list,/episodes/search,/intentions/check,/intentions/format,/context/*write— everything else under/v1/*admin—/v1/admin/*and/v1/maintenance/*
Scopes are explicit: admin does not auto-grant write or read. When you mint a key, list every scope it should have.
Bootstrap an admin key
Set PALACE_BOOTSTRAP_ADMIN_KEY to a value of the form pk_live_<32 alphanumeric chars>. On lifespan startup, if no admin key exists, Palace inserts a row with read+write+admin scopes. Idempotent.
export PALACE_BOOTSTRAP_ADMIN_KEY=pk_live_$(LC_ALL=C tr -dc 'A-Za-z0-9' </dev/urandom | head -c 32)
echo "$PALACE_BOOTSTRAP_ADMIN_KEY" # save this — Palace doesn't log it
Mint additional keys
curl -X POST http://localhost:8000/v1/admin/keys \
-H "X-Palace-Key: $PALACE_BOOTSTRAP_ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"label": "mypalclara-prod", "scopes": ["read", "write"]}'
# response: {"data": {"key_id": "...", "plaintext_key": "pk_live_...", ...}}
The plaintext is returned once. Palace stores only a bcrypt hash plus an 8-char prefix index for lookup.
Disable auth (development / tests only)
export PALACE_AUTH_DISABLED=true
The mock test suite sets this automatically; live integration tests opt back in per-test.
Multi-tenancy (phase 3 slice 2)
Every row in every user-data table carries a tenant_id. API keys are bound to a tenant on creation; the middleware sets request.state.auth.tenant_id from the key, and every service query filters by it. Qdrant collections are per-tenant: palace_memories_<tenant_id>, palace_episodes_<tenant_id>.
A default tenant is created on first boot (PALACE_DEFAULT_TENANT_ID to override). Single-tenant deployments work zero-config.
Tenant ID format
^[a-z0-9_]{1,32}$ — lowercase alphanumeric + underscore, max 32 chars. Anything else → 400.
Mint a tenant-bound key
# Create a tenant
curl -X POST http://localhost:8000/v1/admin/tenants \
-H "X-Palace-Key: $ADMIN_KEY" \
-d '{"id": "acme", "label": "Acme Corp"}'
# Issue a key bound to that tenant
curl -X POST http://localhost:8000/v1/admin/keys \
-H "X-Palace-Key: $ADMIN_KEY" \
-d '{"label": "acme-prod", "scopes": ["read","write"], "tenant_id": "acme"}'
Cross-tenant admin keys
For migrations or support, mint a key with cross_tenant: true:
curl -X POST http://localhost:8000/v1/admin/keys \
-H "X-Palace-Key: $ADMIN_KEY" \
-d '{"label": "support", "scopes": ["read","write","admin"], "cross_tenant": true}'
The bootstrap admin key (from PALACE_BOOTSTRAP_ADMIN_KEY) is a cross-tenant key by default.
Migrations (phase 4 slice 1)
Alembic now manages schema. init_db() still creates tables on first boot for zero-config dev, AND stamps the latest revision so future alembic upgrade head calls know where to start.
# Fresh install — nothing to do; lifespan startup handles it.
.venv/bin/uvicorn mypalace.main:app
# Pre-phase-4 install with existing data — stamp once, then upgrade as usual:
.venv/bin/alembic stamp 2026_05_04_0001_baseline
.venv/bin/alembic upgrade head
# Day-to-day after a new migration lands:
.venv/bin/alembic upgrade head
# Generate a new migration from model changes:
.venv/bin/alembic revision --autogenerate -m "add foo column"
DB URL is read from PALACE_DATABASE_URL (no need to set sqlalchemy.url in alembic.ini).
Observability (phase 4 slice 2)
Prometheus metrics
/metrics exposes Prometheus exposition format. Always on, always public (k8s scrapers need that — lock down via your ingress if necessary).
Counters:
palace_http_requests_total{method, route, status_class}palace_http_request_duration_seconds{method, route}(histogram)palace_cache_hits_total{namespace},palace_cache_misses_total{namespace}palace_graph_writes_total{kind},palace_graph_failures_totalpalace_jobs_total{kind, outcome}(populated by phase-4 slice 3)
Routes are normalized — UUIDs and long IDs become {id} so Prometheus label cardinality stays bounded.
OpenTelemetry traces
Optional. Set PALACE_OTLP_ENDPOINT=http://otel-collector:4317 and install the optional extra:
pip install "mypalace[otel]"
Auto-instruments FastAPI + httpx. Service name defaults to mypalace (override via PALACE_OTLP_SERVICE_NAME). No-op if either the env var is unset or the SDK isn't installed.
Structured logs
structlog configured at lifespan startup. Two modes via PALACE_LOG_FORMAT:
pretty(default) — colored console output, dev-friendlyjson— newline-delimited JSON, production-ready
Every request gets a request_id (read from X-Request-ID header if present, else a fresh uuid4). It's bound to structlog's contextvars so every log line in the request scope carries it. The same X-Request-ID is echoed back on the response.
Background workers (phase 4 slice 3)
Postgres-backed job queue using SELECT ... FOR UPDATE SKIP LOCKED. Built-in handlers cover reflection (episode reflection from a session) and synthesis (narrative arc rollup). The web process can opt to enqueue jobs and let a separate worker process pick them up.
# In one terminal: the web server
.venv/bin/uvicorn mypalace.main:app --port 8000
# In another: one or more workers
.venv/bin/python -m mypalace.workers.runner
# (Run multiple — SKIP LOCKED gives them safe concurrency)
Knobs
PALACE_WORKER_POLL_INTERVAL=1.0— seconds between polls when idlePALACE_WORKER_LEASE_SECONDS=60— max time a worker holds a claim before another can re-take itPALACE_WORKER_MAX_ATTEMPTS=3— failures past this markstatus=failedPALACE_WORKER_QUEUE_ENABLED=true(phase 5) — route async-mode/v1/reflection/sessionand/v1/synthesis/narrativesthrough the worker queue instead of the in-processasyncio.create_taskpath. Requires at least one worker process; defaults False so single-process deployments without a worker keep working.
Custom handlers
from mypalace.workers import register_handler
async def my_handler(payload: dict, tenant_id: str) -> dict:
return {"processed": payload}
register_handler("my_kind", my_handler)
# Then enqueue:
from mypalace.workers import enqueue
await enqueue(kind="my_kind", user_id="u1", payload={"x": 1}, tenant_id="default")
The runner picks up my_kind automatically once the registry is populated.
Rate limits (phase 4 slice 4)
Optional sliding-window rate limiter, scoped to (tenant, key, user). Requires Redis.
export PALACE_RATE_LIMIT_ENABLED=true
export PALACE_REDIS_URL=redis://localhost:6379
export PALACE_RATE_LIMIT_DEFAULT=120 # req/min for most endpoints
export PALACE_RATE_LIMIT_SEARCH=60 # tighter bucket for /search + /context
Disabled by default. When disabled, the middleware is a fast no-op. When enabled but Redis is unreachable, it fails open (logs a warning, lets the request through) — Palace stays available even when the limiter can't.
Bypass with unlimited scope
Trusted server-to-server keys can opt out by adding the unlimited scope at issuance time:
curl -X POST http://localhost:8000/v1/admin/keys \
-H "X-Palace-Key: $ADMIN_KEY" \
-d '{"label":"trusted-svc","scopes":["read","write","unlimited"]}'
Response shape on 429
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json
{"error": {"code": "rate_limited", "message": "Too many requests in 60s window: 121/120", "retry_after_seconds": 60}}
Graph layer (phase 3 slice 3)
Optional FalkorDB integration. When PALACE_FALKORDB_URL is set, every memory / episode / arc create writes a node into the per-tenant graph (palace_<tenant_id>), and every supersession writes a SUPERSEDES edge. Writes are fire-and-forget — graph failures never break the primary write.
# FalkorDB ships as a Redis module:
podman run -d --name palace-falkor -p 6379:6379 docker.io/falkordb/falkordb:latest
export PALACE_FALKORDB_URL=redis://localhost:6379
Without the env var, the graph layer is a no-op and /v1/graph/* returns 503.
Schema
(:Memory {id, user_id, agent_id, content, memory_type, importance})
(:Episode {id, user_id, summary, significance, timestamp})
(:Arc {id, user_id, title, status})
(Memory)-[:SUPERSEDES]->(Memory)
(Episode)-[:PARTICIPATES_IN]->(Arc)
Querying neighbors
curl "http://localhost:8000/v1/graph/neighbors?node_id=<memory_id>&depth=2&edge_type=SUPERSEDES" \
-H "X-Palace-Key: $KEY"
# → {"data": {"nodes": [...], "edges": [...]}, "meta": {...}}
Depth is capped at 3 hops; edge_type filter is optional. No raw Cypher passthrough — the only graph API surface is /v1/graph/neighbors.
Cache (phase 3 slice 4)
Optional Redis read-through cache for /v1/context/layered and /v1/memories/search. Keys are hashed (tenant_id, namespace, params); TTL defaults to 60s. On memory create/update/delete, all matching tenant cache entries are invalidated.
# FalkorDB and the cache can share the same Redis instance:
export PALACE_REDIS_URL=redis://localhost:6379
# Optional knobs:
export PALACE_CACHE_TTL_SEARCH=60 # seconds
export PALACE_CACHE_TTL_GET=300
# Disable without unsetting URL (e.g. tests):
export PALACE_CACHE_DISABLED=true
Without PALACE_REDIS_URL, every read goes straight to Postgres + Qdrant.
Cache failures degrade to misses — Palace stays correct, just slower.
gRPC transport (phase 3 + phase 5)
Optional second transport alongside REST. Phase 3 added MemoryService; phase 5 expanded to a full mirror — SessionService, EpisodeService, ArcService, IntentionService, DynamicsService, RetrievalService, IngestionService, JobService. Same auth (X-Palace-Key in metadata), same scope rules, same singleton services as the HTTP path.
export PALACE_GRPC_PORT=50051
.venv/bin/uvicorn mymypalace.main:app --port 8000
# → starts FastAPI on :8000 AND gRPC on :50051
Auth uses the same X-Palace-Key, sent as gRPC metadata x-palace-key. Scope rules are identical to HTTP.
from mypalace_client.grpc import PalaceGrpcClient
async with PalaceGrpcClient("localhost:50051", api_key="pk_live_...") as client:
mem = await client.create(user_id="u1", content="hello via gRPC")
results = await client.search(query="hello", limit=5)
Regenerating stubs
python -m grpc_tools.protoc -I=proto \
--python_out=mypalace/grpc/_generated \
--grpc_python_out=mypalace/grpc/_generated \
proto/palace.proto
# Then re-apply the local import fix in palace_pb2_grpc.py:
# sed -i '' 's/^import mypalace_pb2/from mypalace.grpc._generated import palace_pb2/' \
# mypalace/grpc/_generated/palace_pb2_grpc.py
Install
From PyPI (server)
pip install mypalace
From PyPI (client only — for AI apps that talk to a remote Palace)
pip install mypalace-client
# Optional gRPC transport:
pip install "mypalace-client[grpc]"
Docker
docker pull bangrocket/mypalace:latest
docker run -p 8000:8000 \
-e PALACE_DATABASE_URL=postgresql+asyncpg://palace:palace@host/palace \
-e QDRANT_URL=http://host:6333 \
-e PALACE_BOOTSTRAP_ADMIN_KEY=pk_live_$(LC_ALL=C tr -dc 'A-Za-z0-9' </dev/urandom | head -c 32) \
bangrocket/mypalace:latest
Production (single-host docker-compose)
For a real deployment with all five backends + a worker, use the production compose file:
cp .env.example .env
$EDITOR .env # set PALACE_BOOTSTRAP_ADMIN_KEY + POSTGRES_PASSWORD
docker compose -f docker-compose.prod.yml up -d
curl http://localhost:8000/health/deep | jq
Full setup, scaling, observability, backup, and upgrade instructions
are in docs/deployment.md.
Quick start (development)
# 1. Start postgres + qdrant (docker or podman)
docker-compose up -d postgres qdrant
# or: podman run -d --name palace-postgres -p 5442:5432 \
# -e POSTGRES_USER=palace -e POSTGRES_PASSWORD=palace -e POSTGRES_DB=palace \
# docker.io/library/postgres:16-alpine
# podman run -d --name palace-qdrant -p 6333:6333 docker.io/qdrant/qdrant:latest
# 2. Install (Python 3.12)
python3.12 -m venv .venv && .venv/bin/pip install -e ".[dev]"
# 3. Configure
cp .env.example .env
# 4. Run
.venv/bin/uvicorn mypalace.main:app --reload --port 8000
Or fully containerized: docker-compose up --build.
API docs at http://localhost:8000/docs.
Smoke test
# Store a memory
curl -X POST http://localhost:8000/v1/memories \
-H 'Content-Type: application/json' \
-d '{"user_id":"u1","content":"User loves dark mode and uses Vim daily","memory_type":"preference"}'
# Semantic search
curl -X POST http://localhost:8000/v1/memories/search \
-H 'Content-Type: application/json' \
-d '{"query":"editor preferences","user_id":"u1","limit":5}'
# Assemble context for an LLM prompt
curl -X POST http://localhost:8000/v1/context \
-H 'Content-Type: application/json' \
-d '{"user_id":"u1","query":"what does the user prefer?","max_memories":5}'
API
All responses use the envelope { "data": ..., "meta": { "count": N, "took_ms": N } }. Errors return HTTP 4xx/5xx with FastAPI's default body.
Memories
| Method | Path | Body |
|---|---|---|
POST |
/v1/memories |
{ user_id, content, memory_type?, agent_id?, source?, importance?, metadata? } |
POST |
/v1/memories/search |
{ query, user_id?, agent_id?, memory_type?, limit?, min_score? } |
GET |
/v1/memories/{id} |
— |
PATCH |
/v1/memories/{id} |
{ content?, memory_type?, importance?, metadata? } |
DELETE |
/v1/memories/{id} |
— |
GET |
/v1/users/{user_id}/memories?limit=50 |
— |
memory_type is a free-form string; conventional values: semantic, episodic, preference, fact.
Sessions
| Method | Path | Body |
|---|---|---|
POST |
/v1/sessions |
{ user_id, title? } |
GET |
/v1/sessions/{id} |
— (returns session + ordered messages) |
POST |
/v1/sessions/{id}/messages |
{ user_id, role, content } |
PATCH |
/v1/sessions/{id} |
{ title?, summary? } |
DELETE |
/v1/sessions/{id} |
— (cascades to messages) |
Context
| Method | Path | Body |
|---|---|---|
POST |
/v1/context |
{ user_id, query, session_id?, max_memories?, max_messages? } |
Health
| Method | Path |
|---|---|
GET |
/health |
Configuration
All settings come from environment variables (or .env).
| Variable | Default | Notes |
|---|---|---|
PALACE_DATABASE_URL |
postgresql+asyncpg://palace:palace@localhost/palace |
asyncpg driver required |
QDRANT_URL |
http://localhost:6333 |
|
QDRANT_COLLECTION |
palace_memories |
created on startup if missing |
EMBEDDING_PROVIDER |
huggingface |
huggingface or openai |
EMBEDDING_MODEL |
BAAI/bge-large-en-v1.5 |
for HF; pass an OpenAI model name when provider is openai |
HF_TOKEN |
— | optional, for gated models |
OPENAI_API_KEY |
— | required if EMBEDDING_PROVIDER=openai |
LLM_PROVIDER |
openrouter |
openrouter or openai |
LLM_API_KEY |
— | required for any LLM call |
LLM_MODEL |
openai/gpt-4o-mini |
OpenRouter-style or OpenAI model id |
LOG_LEVEL |
INFO |
Architecture
palace/
├── main.py FastAPI app + lifespan (creates tables, ensures Qdrant collection)
├── config.py Pydantic Settings (.env aware)
├── models.py SQLModel tables: Memory, Session, Message
├── database.py Async SQLAlchemy engine + session factory
├── embeddings.py EmbeddingProvider protocol + HF and OpenAI impls
├── vector.py Async Qdrant wrapper (ensure/upsert/query/delete)
├── llm.py Async chat-completion client (OpenAI-compatible)
├── memory_service.py CRUD + semantic search; lazy embedder
├── session_service.py Session + message lifecycle
├── context_service.py Memory search + recent messages → prompt context
└── api/
├── common.py Pydantic request/response models, ApiResponse envelope
├── memories.py memory routes + users-router for /v1/users/{id}/memories
├── sessions.py session routes
└── context.py context route
Behavior notes
- Create memory → INSERT into postgres, then embed + UPSERT into Qdrant. Embedding happens outside the DB transaction so a slow embedder doesn't hold a row lock.
- Search → embed query, Qdrant
query_pointswithuser_id/agent_id/memory_typefilters, then fetch full rows from postgres in one IN-clause. Search bumpsaccess_countandaccessed_aton the returned rows. - Update content → re-embeds and UPSERTs (Qdrant point id == memory id).
- Delete → removes the postgres row, then deletes the Qdrant point.
Running tests
.venv/bin/python -m pytest
The test suite uses mocks (no postgres or qdrant required) and covers all 13 routes.
For end-to-end verification, the smoke commands above exercise the whole stack against a live DB + Qdrant.
Platform notes
macOS x86_64
torch is capped at 2.2.2 on macOS x86_64 (Apple deprecated x86 wheels for newer torch). pyproject.toml therefore pins numpy<2 and transformers<5 to stay ABI-compatible with that torch. If you are on Apple Silicon (arm64) or Linux, those pins are still safe — just not strictly required.
If you'd rather avoid the torch dependency entirely, set:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...
The embedder is loaded lazily, so importing the app no longer triggers a model download.
Container engines
docker-compose.yml works under both Docker Desktop and podman compose. With plain podman run, see the Quick start above for the equivalent commands.
Drop-in mode for mypalclara
MyPalace is a strict superset of mypalclara's embedded Palace surface
since end of phase 3. To swap mypalclara from embedded to remote MyPalace,
follow the step-by-step guide at docs/migrating-mypalclara.md.
The short version:
# 1. install the client
pip install mypalace-client==0.10.0 # or "mypalace-client[grpc]"
# 2. copy examples/mypalclara_router.py into mypalclara as
# mypalclara/core/memory/routed.py and swap the imports
# 3. set the env and you're done
export USE_PALACE_SERVICE=true
export PALACE_SERVICE_URL=http://mypalace:8000
export PALACE_API_KEY=pk_live_... # mint via /v1/admin/keys
The router uses explicit pass-throughs — every public method of
ClaraMemory + MemoryManager has its own entry, no __getattr__
fallthrough. mypalclara's existing Discord-transcript replay script writes
through the router to remote MyPalace with no Palace-specific port — see
the migration guide for replay + validation + rollback steps.
Legacy slice notes (kept for historical reference)
Six more endpoints that mypalclara's episode_store.* and MM.reflect_on_session /
MM.run_narrative_synthesis callers can route to remote Palace:
POST /v1/reflection/session?mode={sync,async}— extract episodes from a message list. Default async returns 202 + job id; sync returns the extracted episodes inline.POST /v1/synthesis/narratives?mode={sync,async}— roll recent episodes into narrative arcs.POST /v1/episodes/search— semantic search over episodes withmin_significancefilter.GET /v1/users/{user_id}/episodes/recent?limit=5— recent episodes, newest first.GET /v1/users/{user_id}/arcs/active?limit=10— active narrative arcs.GET /v1/jobs/{job_id}— poll async job status (pending/completed/failed) and retrieve the result.
Storage: episodes in a separate palace_episodes Qdrant collection; arcs in a narrative_arcs Postgres table with a JSONB key_episode_ids array. Async mode uses pure asyncio (no Celery/arq); jobs in flight don't survive process restarts (caller can re-POST).
LLM extraction uses palace/llm.py's OpenAI-compatible chat-completion client (works against OpenRouter, OpenAI, Anthropic-via-OpenRouter, or any compatible endpoint). Set LLM_API_KEY in .env.
The examples/mypalclara_router.py reference now routes episode_store, reflect_on_session, and run_narrative_synthesis when USE_PALACE_SERVICE=true, falling back to embedded otherwise.
Slice 3 additions: FSRS dynamics
Five more endpoints port mypalclara's FSRS-6 spaced-repetition memory dynamics (per-memory stability/difficulty/retrievability state, access logging, and composite ranking) to remote Palace:
POST /v1/memories/{id}/promote— apply an FSRS review (defaultgrade=3/ GOOD,signal_type="used_in_response"). Auto-creates the dynamics row on first call.POST /v1/memories/{id}/demote— failure signal (equivalent topromote(grade=1)); defaultreason="user_correction".GET /v1/memories/{id}/dynamics?user_id=u1— read the current FSRS state. 404 if no dynamics row.POST /v1/memories/{id}/score— composite ranking breakdown given the caller's semantic score. Returns{composite_score, fsrs_score, retrievability, storage_strength}. Composite formula:composite = 0.6 * semantic + 0.4 * fsrs_score, wherefsrs_score = (0.7 * retrievability + 0.3 * storage_strength) * importance_weight.POST /v1/maintenance/prune-access-logs?retention_days=90— admin op; deletes old access log rows.
The FSRS-6 math (palace/dynamics/fsrs.py) is ported character-for-character from mypalclara, with a deterministic regression net (tests/test_dynamics.py) pinning known input -> output values to catch porting drift.
MM.get_last_retrieved_memory_ids stays embedded (slice-3 design D4): the HTTP service is stateless between requests, so an in-process cache wouldn't survive across worker processes. The mypalclara router caches retrieved IDs client-side instead.
Slice 4 additions: intentions
Six endpoints port mypalclara's intentions subsystem — deterministic-trigger reminders that fire when matching keywords, topics, times, or context conditions are detected. No LLM in this slice; all matching is purely structural (keyword regex, time comparison, context dict matching, word-overlap fallback for topic).
POST /v1/intentions— set a new intention ({user_id, content, trigger_conditions, ...}).POST /v1/intentions/check— evaluate all unfired intentions for a user against a message + context. Returns the fired list (sorted by priority); marks fired and deletes any withfire_once=true.POST /v1/intentions/format— render fired intentions as a markdown bullet list for system-prompt injection.DELETE /v1/intentions/{id}— delete a single intention. 404 if not found.GET /v1/users/{user_id}/intentions?fired={true|false|all}&limit=50— list intentions for a user.POST /v1/maintenance/cleanup-intentions— admin op; deletes intentions whoseexpires_athas passed.
Four trigger types (set via trigger_conditions["type"]):
keyword— substring match againstkeywords; optionalregexandcase_sensitive.topic— word-overlap fraction vs.topic>=threshold; optionalquick_keywordspre-filter.time— fires when current UTC time has reachedat(specific) orafter(open-ended).context— matches a dict of{channel_name, is_dm, time_of_day, day_of_week}; all configured keys must match.
The mypalclara_router.py reference now routes MM.set_intention, MM.check_intentions, and MM.format_intentions_for_prompt when USE_PALACE_SERVICE=true.
Slice 5 additions: layered retrieval + smart ingestion
Three endpoints + an activated flag close out phase 2: multi-tier context assembly, LLM-driven memory extraction with vector dedup + heuristic supersede, and a manual supersede record:
POST /v1/context/layered— multi-tier context assembly. Parallel-fetches L1 (top semantic memories + recent episodes + active arcs) and L2 (query-filtered memories optionally FSRS-reranked + query-filtered episodes), char-budgets each layer, and optionally pullsrecent_messagesfrom a session. Returns a structured dict (caller composes into prompts; Palace stays generic).POST /v1/memories/{id}/supersede— manually replace a memory. Creates a new memory and an audit row; demotes the old memory's FSRS state.GET /v1/memories/{id}/supersedes— supersession history involving this memory id (either side).- Activated:
infer=trueonPOST /v1/memories/batchnow runs the smart-ingestion pipeline: LLM extracts factual memories from the conversation block; for each candidate, embed + Qdrant-search the nearest existing memory;score > 0.95skip as duplicate,score > 0.75heuristic contradiction check (auto-supersede when confidence > 0.7, else skip as similar), else write fresh. Responsemetacarriessupersessionsandskippedarrays.
The contradiction heuristic is intentionally simple (no LLM in the hot ingestion path): negation-asymmetry plus stemmed token overlap. The LLM extraction is the only LLM call.
MM.build_prompt_layered, MM.smart_ingest, and MM.supersede_memory graduate to remote in examples/mypalclara_router.py when USE_PALACE_SERVICE=true. Note: the routed build_prompt_layered returns a structured LayeredContext dict instead of typed Messages — mypalclara's caller must adapt (the routed path drops Discord-specific persona/channel layers per phase-2 design D1).
Integration tests
Default pytest runs only the fast mock-based suite (~2s). Live
end-to-end tests against real postgres + qdrant are opt-in:
pytest # mocks only — fast iteration
pytest -m integration # live containers via testcontainers-python
The integration suite spins up postgres + qdrant containers per session (via testcontainers-python), truncates tables between tests, and exercises:
- Memory CRUD, semantic search ranking, batch create + filtered list,
delete-all with filters (
tests/integration/test_memories_live.py). - Session/message lifecycle and context assembly
(
tests/integration/test_sessions_live.py). palace_clientagainst a live Palace via in-process ASGI transport, proving wire-contract agreement (tests/integration/test_client_e2e.py).
Requires Docker or podman with a running machine. First run pulls
postgres + qdrant images and downloads the small embedding model
(sentence-transformers/all-MiniLM-L6-v2); subsequent runs are 30-60s.
Bulk import / export (phase 6 slice 2)
Disaster recovery + tenant migration. Streaming NDJSON, one record per line.
# Stream a tenant dump to a file
curl "http://localhost:8000/v1/admin/export?tenant_id=acme" \
-H "X-Palace-Key: $ADMIN_KEY" \
-o palace-acme-export.ndjson
# Re-import into a target tenant (creates the tenant if missing)
curl -X POST "http://localhost:8000/v1/admin/import?tenant_id=acme-restored" \
-H "X-Palace-Key: $ADMIN_KEY" \
--data-binary @palace-acme-export.ndjson
Each line is one row prefixed by _type (one of tenant, memory, session, narrative_arc, intention, memory_dynamics, memory_supersession). Vector data is not included — re-embed on import keeps dumps portable across embedding models. The tenant_id query param always wins over any tenant_id field in the dump. Idempotent on primary keys via db.merge(). api_keys are excluded — set up auth on the new deployment separately.
Pair with /v1/admin/reembed (slice 4) by passing ?reembed=false for very large imports, then triggering re-embed afterward.
Memory TTL (phase 6 slice 3)
Optional time-to-live on memories. Pass ttl_seconds on create:
curl -X POST http://localhost:8000/v1/memories \
-H "X-Palace-Key: $KEY" \
-d '{"user_id":"u1","content":"login code: 482719","ttl_seconds":300,"memory_type":"session"}'
The memory's expires_at is set to now + ttl. Search/list/get already exclude expired rows even before cleanup runs (WHERE expires_at IS NULL OR expires_at > now()).
Garbage collection runs as a worker handler. Enqueue per-tenant:
from mypalace.workers import enqueue
await enqueue(kind="cleanup", user_id="system",
payload={"batch_size": 500}, tenant_id="acme")
Or schedule it via cron / your orchestrator. Operators without a worker process can still delete expired memories via the regular DELETE endpoint — the index excludes them from reads regardless.
Releasing (operator notes)
Tagging vX.Y.Z triggers .github/workflows/release.yml which runs tests, builds both packages, and (when configured) publishes to PyPI + Docker Hub. Tags ending in -rc* or -beta* route to TestPyPI for rehearsal.
One-time setup
1. PyPI trusted publishing. For each project at https://pypi.org/manage/account/publishing/, add a "GitHub" publisher pointing at this repo + workflow release.yml. Leave the environment field empty. Do this for both:
mypalacemypalace-client
Once configured, the pypa/gh-action-pypi-publish@release/v1 step OIDC-authenticates and uploads with no API token to manage. Repeat the same for https://test.pypi.org if you want rc/beta rehearsals to publish.
2. Docker Hub (optional). If you want Docker images built on every tag, configure three repo settings (Settings → Secrets and variables → Actions):
- Variable
PUBLISH_DOCKER=true - Secret
DOCKERHUB_USERNAME= your Docker Hub username - Secret
DOCKERHUB_TOKEN= a Docker Hub access token
Without vars.PUBLISH_DOCKER=true, the docker job is skipped and the GitHub release still cuts.
Cutting a release
# Rehearse on TestPyPI first
git tag -a v0.5.0-rc1 -m "rehearsal"
git push origin v0.5.0-rc1
# Watch https://github.com/BangRocket/mypalace/actions
# Once green, cut the real tag
git tag -a v0.5.0 -m "0.5.0 — see CHANGELOG.md"
git push origin v0.5.0
If a tag's workflow fails, fix the issue, delete the tag locally and remote (git tag -d v0.5.0 && git push --delete origin v0.5.0), then re-tag.
License
PolyForm Noncommercial 1.0.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mypalace-0.10.0.tar.gz.
File metadata
- Download URL: mypalace-0.10.0.tar.gz
- Upload date:
- Size: 228.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b49ac29be0c20c1b9a96c5b43a7c66a2a1b91b0ffc692dcf25bbdc1097d063f1
|
|
| MD5 |
ad22b689105b944adac514244aeeadce
|
|
| BLAKE2b-256 |
b57216fb43f7f9479622d72aa943e23bb888f89ddeed4cb3a80c697af1710442
|
Provenance
The following attestation bundles were made for mypalace-0.10.0.tar.gz:
Publisher:
release.yml on BangRocket/MyPalace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mypalace-0.10.0.tar.gz -
Subject digest:
b49ac29be0c20c1b9a96c5b43a7c66a2a1b91b0ffc692dcf25bbdc1097d063f1 - Sigstore transparency entry: 1440036972
- Sigstore integration time:
-
Permalink:
BangRocket/MyPalace@667c40777388b1939cb8d9de4dbe4ddfd3628497 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/BangRocket
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@667c40777388b1939cb8d9de4dbe4ddfd3628497 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mypalace-0.10.0-py3-none-any.whl.
File metadata
- Download URL: mypalace-0.10.0-py3-none-any.whl
- Upload date:
- Size: 177.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffcc034039d4b200f8081c27f454973ff2371f838778006c0837617a462e5751
|
|
| MD5 |
ece4bad3985369984c0ba7a3e970054d
|
|
| BLAKE2b-256 |
7546ed377b81511989da40e433841ef78c9c0837fcc460a0ad6689052c8a64a7
|
Provenance
The following attestation bundles were made for mypalace-0.10.0-py3-none-any.whl:
Publisher:
release.yml on BangRocket/MyPalace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mypalace-0.10.0-py3-none-any.whl -
Subject digest:
ffcc034039d4b200f8081c27f454973ff2371f838778006c0837617a462e5751 - Sigstore transparency entry: 1440036995
- Sigstore integration time:
-
Permalink:
BangRocket/MyPalace@667c40777388b1939cb8d9de4dbe4ddfd3628497 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/BangRocket
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@667c40777388b1939cb8d9de4dbe4ddfd3628497 -
Trigger Event:
push
-
Statement type: