Skip to main content

MyPalace — standalone memory service for AI assistants. Full HTTP + gRPC surface, multi-tenancy, graph, cache, workers, rate limits, websockets, observability, analytics, bulk import/export, TTL, embedding migration, audit log, memory versioning.

Project description

MyPalace Memory Service

PyPI version License: PolyForm Noncommercial 1.0.0 Python 3.12+

A standalone, lightweight memory service for AI assistants. Stores facts, preferences, and conversation history; serves them back via semantic search and LLM-ready context blocks.

Extracted from mypalclara's Palace memory system as an independent microservice with no mypalclara dependency.


What it does

  • Memory CRUD + semantic search — embed text, store in Qdrant, retrieve by similarity
  • Session/message persistence — conversation threads with PostgreSQL
  • Context assembly — combine relevant memories + recent messages into a single payload for LLM prompts
  • Pluggable embeddings — HuggingFace (local) or OpenAI (API)
  • Pluggable LLM backend — any OpenAI-compatible chat completion endpoint (OpenRouter, OpenAI, etc.)

Project status — v0.7.1

Released to PyPI as mypalace (server) and mypalace-client. Production-ready in scope; see the phase notes below for what's in and what's deliberately left out.

Capabilities (built across phases 1–7):

  • Phase 1 — memory CRUD + semantic search, sessions, context assembly, pluggable embeddings (HuggingFace / OpenAI) and LLM backend.
  • Phase 2 — episodes + LLM reflection, narrative arcs, FSRS-6 dynamics, intentions with 4 trigger matchers, layered context, smart ingestion (LLM extract + dedup + auto-supersede), manual supersede + audit history.
  • Phase 3 — API-key auth with read/write/admin scopes, full multi-tenancy (per-tenant Qdrant collections, key-bound + cross-tenant admin keys), optional FalkorDB graph layer with async writes + neighbors endpoint, optional Redis read-through cache, gRPC MemoryService transport, PyPI/Docker release pipeline.
  • Phase 4 — Alembic migrations with auto-stamp on first boot, Prometheus /metrics + OpenTelemetry traces + structlog, Postgres-backed worker queue with SELECT … FOR UPDATE SKIP LOCKED, per-(tenant, key, user) sliding- window rate limits, WebSocket event subscriptions, graph-walked l3_graph_context in layered retrieval.
  • Phase 5 — Worker-queue routing for async reflection/synthesis, episode/ intention/arc event publishers, full gRPC mirror of remaining surfaces (22 RPCs / 8 services), cross-tenant admin analytics endpoint.
  • Phase 6 — Bulk /v1/admin/export + /v1/admin/import (NDJSON) for DR and tenant migration, memory TTL with worker-driven cleanup, embedding-model migration via /v1/admin/reembed, release-pipeline fixes.
  • Phase 7 — Admin operation audit log, append-only memory change history with /v1/memories/{id}/history, cross-tenant search (POST /v1/memories/search?tenant_id=ALL), mypalclara migration guide.

In progress (phase 8): production hardening — deep /health that pings each backend, boot-time config validation, DB-query observability via SQLAlchemy event hooks, production docker-compose + deployment guide.

Deliberately out of scope (operators who need them should fork or deploy separately): per-tenant Postgres schemas, admin web UI, memory clustering / topic discovery, fine-grained per-key tenant-resource scoping.


Auth (phase 3 slice 1)

Every /v1/* endpoint requires a valid API key in the X-Palace-Key header. /health, /docs, /redoc, /openapi.json remain public.

Three scopes:

  • readGET /v1/*, POST /v1/memories/search, /list, /episodes/search, /intentions/check, /intentions/format, /context/*
  • write — everything else under /v1/*
  • admin/v1/admin/* and /v1/maintenance/*

Scopes are explicit: admin does not auto-grant write or read. When you mint a key, list every scope it should have.

Bootstrap an admin key

Set PALACE_BOOTSTRAP_ADMIN_KEY to a value of the form pk_live_<32 alphanumeric chars>. On lifespan startup, if no admin key exists, Palace inserts a row with read+write+admin scopes. Idempotent.

export PALACE_BOOTSTRAP_ADMIN_KEY=pk_live_$(LC_ALL=C tr -dc 'A-Za-z0-9' </dev/urandom | head -c 32)
echo "$PALACE_BOOTSTRAP_ADMIN_KEY"   # save this — Palace doesn't log it

Mint additional keys

curl -X POST http://localhost:8000/v1/admin/keys \
  -H "X-Palace-Key: $PALACE_BOOTSTRAP_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "mypalclara-prod", "scopes": ["read", "write"]}'
# response: {"data": {"key_id": "...", "plaintext_key": "pk_live_...", ...}}

The plaintext is returned once. Palace stores only a bcrypt hash plus an 8-char prefix index for lookup.

Disable auth (development / tests only)

export PALACE_AUTH_DISABLED=true

The mock test suite sets this automatically; live integration tests opt back in per-test.


Multi-tenancy (phase 3 slice 2)

Every row in every user-data table carries a tenant_id. API keys are bound to a tenant on creation; the middleware sets request.state.auth.tenant_id from the key, and every service query filters by it. Qdrant collections are per-tenant: palace_memories_<tenant_id>, palace_episodes_<tenant_id>.

A default tenant is created on first boot (PALACE_DEFAULT_TENANT_ID to override). Single-tenant deployments work zero-config.

Tenant ID format

^[a-z0-9_]{1,32}$ — lowercase alphanumeric + underscore, max 32 chars. Anything else → 400.

Mint a tenant-bound key

# Create a tenant
curl -X POST http://localhost:8000/v1/admin/tenants \
  -H "X-Palace-Key: $ADMIN_KEY" \
  -d '{"id": "acme", "label": "Acme Corp"}'

# Issue a key bound to that tenant
curl -X POST http://localhost:8000/v1/admin/keys \
  -H "X-Palace-Key: $ADMIN_KEY" \
  -d '{"label": "acme-prod", "scopes": ["read","write"], "tenant_id": "acme"}'

Cross-tenant admin keys

For migrations or support, mint a key with cross_tenant: true:

curl -X POST http://localhost:8000/v1/admin/keys \
  -H "X-Palace-Key: $ADMIN_KEY" \
  -d '{"label": "support", "scopes": ["read","write","admin"], "cross_tenant": true}'

The bootstrap admin key (from PALACE_BOOTSTRAP_ADMIN_KEY) is a cross-tenant key by default.

Migrations (phase 4 slice 1)

Alembic now manages schema. init_db() still creates tables on first boot for zero-config dev, AND stamps the latest revision so future alembic upgrade head calls know where to start.

# Fresh install — nothing to do; lifespan startup handles it.
.venv/bin/uvicorn mypalace.main:app

# Pre-phase-4 install with existing data — stamp once, then upgrade as usual:
.venv/bin/alembic stamp 2026_05_04_0001_baseline
.venv/bin/alembic upgrade head

# Day-to-day after a new migration lands:
.venv/bin/alembic upgrade head

# Generate a new migration from model changes:
.venv/bin/alembic revision --autogenerate -m "add foo column"

DB URL is read from PALACE_DATABASE_URL (no need to set sqlalchemy.url in alembic.ini).


Observability (phase 4 slice 2)

Prometheus metrics

/metrics exposes Prometheus exposition format. Always on, always public (k8s scrapers need that — lock down via your ingress if necessary).

Counters:

  • palace_http_requests_total{method, route, status_class}
  • palace_http_request_duration_seconds{method, route} (histogram)
  • palace_cache_hits_total{namespace}, palace_cache_misses_total{namespace}
  • palace_graph_writes_total{kind}, palace_graph_failures_total
  • palace_jobs_total{kind, outcome} (populated by phase-4 slice 3)

Routes are normalized — UUIDs and long IDs become {id} so Prometheus label cardinality stays bounded.

OpenTelemetry traces

Optional. Set PALACE_OTLP_ENDPOINT=http://otel-collector:4317 and install the optional extra:

pip install "mypalace[otel]"

Auto-instruments FastAPI + httpx. Service name defaults to mypalace (override via PALACE_OTLP_SERVICE_NAME). No-op if either the env var is unset or the SDK isn't installed.

Structured logs

structlog configured at lifespan startup. Two modes via PALACE_LOG_FORMAT:

  • pretty (default) — colored console output, dev-friendly
  • json — newline-delimited JSON, production-ready

Every request gets a request_id (read from X-Request-ID header if present, else a fresh uuid4). It's bound to structlog's contextvars so every log line in the request scope carries it. The same X-Request-ID is echoed back on the response.


Background workers (phase 4 slice 3)

Postgres-backed job queue using SELECT ... FOR UPDATE SKIP LOCKED. Built-in handlers cover reflection (episode reflection from a session) and synthesis (narrative arc rollup). The web process can opt to enqueue jobs and let a separate worker process pick them up.

# In one terminal: the web server
.venv/bin/uvicorn mypalace.main:app --port 8000

# In another: one or more workers
.venv/bin/python -m mypalace.workers.runner
# (Run multiple — SKIP LOCKED gives them safe concurrency)

Knobs

  • PALACE_WORKER_POLL_INTERVAL=1.0 — seconds between polls when idle
  • PALACE_WORKER_LEASE_SECONDS=60 — max time a worker holds a claim before another can re-take it
  • PALACE_WORKER_MAX_ATTEMPTS=3 — failures past this mark status=failed
  • PALACE_WORKER_QUEUE_ENABLED=true (phase 5) — route async-mode /v1/reflection/session and /v1/synthesis/narratives through the worker queue instead of the in-process asyncio.create_task path. Requires at least one worker process; defaults False so single-process deployments without a worker keep working.

Custom handlers

from mypalace.workers import register_handler

async def my_handler(payload: dict, tenant_id: str) -> dict:
    return {"processed": payload}

register_handler("my_kind", my_handler)
# Then enqueue:
from mypalace.workers import enqueue
await enqueue(kind="my_kind", user_id="u1", payload={"x": 1}, tenant_id="default")

The runner picks up my_kind automatically once the registry is populated.


Rate limits (phase 4 slice 4)

Optional sliding-window rate limiter, scoped to (tenant, key, user). Requires Redis.

export PALACE_RATE_LIMIT_ENABLED=true
export PALACE_REDIS_URL=redis://localhost:6379
export PALACE_RATE_LIMIT_DEFAULT=120     # req/min for most endpoints
export PALACE_RATE_LIMIT_SEARCH=60       # tighter bucket for /search + /context

Disabled by default. When disabled, the middleware is a fast no-op. When enabled but Redis is unreachable, it fails open (logs a warning, lets the request through) — Palace stays available even when the limiter can't.

Bypass with unlimited scope

Trusted server-to-server keys can opt out by adding the unlimited scope at issuance time:

curl -X POST http://localhost:8000/v1/admin/keys \
  -H "X-Palace-Key: $ADMIN_KEY" \
  -d '{"label":"trusted-svc","scopes":["read","write","unlimited"]}'

Response shape on 429

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json

{"error": {"code": "rate_limited", "message": "Too many requests in 60s window: 121/120", "retry_after_seconds": 60}}

Graph layer (phase 3 slice 3)

Optional FalkorDB integration. When PALACE_FALKORDB_URL is set, every memory / episode / arc create writes a node into the per-tenant graph (palace_<tenant_id>), and every supersession writes a SUPERSEDES edge. Writes are fire-and-forget — graph failures never break the primary write.

# FalkorDB ships as a Redis module:
podman run -d --name palace-falkor -p 6379:6379 docker.io/falkordb/falkordb:latest
export PALACE_FALKORDB_URL=redis://localhost:6379

Without the env var, the graph layer is a no-op and /v1/graph/* returns 503.

Schema

(:Memory {id, user_id, agent_id, content, memory_type, importance})
(:Episode {id, user_id, summary, significance, timestamp})
(:Arc {id, user_id, title, status})

(Memory)-[:SUPERSEDES]->(Memory)
(Episode)-[:PARTICIPATES_IN]->(Arc)

Querying neighbors

curl "http://localhost:8000/v1/graph/neighbors?node_id=<memory_id>&depth=2&edge_type=SUPERSEDES" \
  -H "X-Palace-Key: $KEY"
# → {"data": {"nodes": [...], "edges": [...]}, "meta": {...}}

Depth is capped at 3 hops; edge_type filter is optional. No raw Cypher passthrough — the only graph API surface is /v1/graph/neighbors.


Cache (phase 3 slice 4)

Optional Redis read-through cache for /v1/context/layered and /v1/memories/search. Keys are hashed (tenant_id, namespace, params); TTL defaults to 60s. On memory create/update/delete, all matching tenant cache entries are invalidated.

# FalkorDB and the cache can share the same Redis instance:
export PALACE_REDIS_URL=redis://localhost:6379
# Optional knobs:
export PALACE_CACHE_TTL_SEARCH=60   # seconds
export PALACE_CACHE_TTL_GET=300
# Disable without unsetting URL (e.g. tests):
export PALACE_CACHE_DISABLED=true

Without PALACE_REDIS_URL, every read goes straight to Postgres + Qdrant.

Cache failures degrade to misses — Palace stays correct, just slower.


gRPC transport (phase 3 + phase 5)

Optional second transport alongside REST. Phase 3 added MemoryService; phase 5 expanded to a full mirror — SessionService, EpisodeService, ArcService, IntentionService, DynamicsService, RetrievalService, IngestionService, JobService. Same auth (X-Palace-Key in metadata), same scope rules, same singleton services as the HTTP path.

export PALACE_GRPC_PORT=50051
.venv/bin/uvicorn mymypalace.main:app --port 8000
# → starts FastAPI on :8000 AND gRPC on :50051

Auth uses the same X-Palace-Key, sent as gRPC metadata x-palace-key. Scope rules are identical to HTTP.

from mypalace_client.grpc import PalaceGrpcClient

async with PalaceGrpcClient("localhost:50051", api_key="pk_live_...") as client:
    mem = await client.create(user_id="u1", content="hello via gRPC")
    results = await client.search(query="hello", limit=5)

Regenerating stubs

python -m grpc_tools.protoc -I=proto \
    --python_out=mypalace/grpc/_generated \
    --grpc_python_out=mypalace/grpc/_generated \
    proto/palace.proto
# Then re-apply the local import fix in palace_pb2_grpc.py:
#   sed -i '' 's/^import mypalace_pb2/from mypalace.grpc._generated import palace_pb2/' \
#     mypalace/grpc/_generated/palace_pb2_grpc.py

Install

From PyPI (server)

pip install mypalace

From PyPI (client only — for AI apps that talk to a remote Palace)

pip install mypalace-client
# Optional gRPC transport:
pip install "mypalace-client[grpc]"

Docker

docker pull bangrocket/mypalace:latest
docker run -p 8000:8000 \
  -e PALACE_DATABASE_URL=postgresql+asyncpg://palace:palace@host/palace \
  -e QDRANT_URL=http://host:6333 \
  -e PALACE_BOOTSTRAP_ADMIN_KEY=pk_live_$(LC_ALL=C tr -dc 'A-Za-z0-9' </dev/urandom | head -c 32) \
  bangrocket/mypalace:latest

Production (single-host docker-compose)

For a real deployment with all five backends + a worker, use the production compose file:

cp .env.example .env
$EDITOR .env       # set PALACE_BOOTSTRAP_ADMIN_KEY + POSTGRES_PASSWORD
docker compose -f docker-compose.prod.yml up -d
curl http://localhost:8000/health/deep | jq

Full setup, scaling, observability, backup, and upgrade instructions are in docs/deployment.md.

Quick start (development)

# 1. Start postgres + qdrant (docker or podman)
docker-compose up -d postgres qdrant
# or:  podman run -d --name palace-postgres -p 5442:5432 \
#         -e POSTGRES_USER=palace -e POSTGRES_PASSWORD=palace -e POSTGRES_DB=palace \
#         docker.io/library/postgres:16-alpine
#      podman run -d --name palace-qdrant -p 6333:6333 docker.io/qdrant/qdrant:latest

# 2. Install (Python 3.12)
python3.12 -m venv .venv && .venv/bin/pip install -e ".[dev]"

# 3. Configure
cp .env.example .env

# 4. Run
.venv/bin/uvicorn mypalace.main:app --reload --port 8000

Or fully containerized: docker-compose up --build.

API docs at http://localhost:8000/docs.

Smoke test

# Store a memory
curl -X POST http://localhost:8000/v1/memories \
  -H 'Content-Type: application/json' \
  -d '{"user_id":"u1","content":"User loves dark mode and uses Vim daily","memory_type":"preference"}'

# Semantic search
curl -X POST http://localhost:8000/v1/memories/search \
  -H 'Content-Type: application/json' \
  -d '{"query":"editor preferences","user_id":"u1","limit":5}'

# Assemble context for an LLM prompt
curl -X POST http://localhost:8000/v1/context \
  -H 'Content-Type: application/json' \
  -d '{"user_id":"u1","query":"what does the user prefer?","max_memories":5}'

API

All responses use the envelope { "data": ..., "meta": { "count": N, "took_ms": N } }. Errors return HTTP 4xx/5xx with FastAPI's default body.

Memories

Method Path Body
POST /v1/memories { user_id, content, memory_type?, agent_id?, source?, importance?, metadata? }
POST /v1/memories/search { query, user_id?, agent_id?, memory_type?, limit?, min_score? }
GET /v1/memories/{id}
PATCH /v1/memories/{id} { content?, memory_type?, importance?, metadata? }
DELETE /v1/memories/{id}
GET /v1/users/{user_id}/memories?limit=50

memory_type is a free-form string; conventional values: semantic, episodic, preference, fact.

Sessions

Method Path Body
POST /v1/sessions { user_id, title? }
GET /v1/sessions/{id} — (returns session + ordered messages)
POST /v1/sessions/{id}/messages { user_id, role, content }
PATCH /v1/sessions/{id} { title?, summary? }
DELETE /v1/sessions/{id} — (cascades to messages)

Context

Method Path Body
POST /v1/context { user_id, query, session_id?, max_memories?, max_messages? }

Health

Method Path
GET /health

Configuration

All settings come from environment variables (or .env).

Variable Default Notes
PALACE_DATABASE_URL postgresql+asyncpg://palace:palace@localhost/palace asyncpg driver required
QDRANT_URL http://localhost:6333
QDRANT_COLLECTION palace_memories created on startup if missing
EMBEDDING_PROVIDER huggingface huggingface or openai
EMBEDDING_MODEL BAAI/bge-large-en-v1.5 for HF; pass an OpenAI model name when provider is openai
HF_TOKEN optional, for gated models
OPENAI_API_KEY required if EMBEDDING_PROVIDER=openai
LLM_PROVIDER openrouter openrouter or openai
LLM_API_KEY required for any LLM call
LLM_MODEL openai/gpt-4o-mini OpenRouter-style or OpenAI model id
LOG_LEVEL INFO

Architecture

palace/
├── main.py              FastAPI app + lifespan (creates tables, ensures Qdrant collection)
├── config.py            Pydantic Settings (.env aware)
├── models.py            SQLModel tables: Memory, Session, Message
├── database.py          Async SQLAlchemy engine + session factory
├── embeddings.py        EmbeddingProvider protocol + HF and OpenAI impls
├── vector.py            Async Qdrant wrapper (ensure/upsert/query/delete)
├── llm.py               Async chat-completion client (OpenAI-compatible)
├── memory_service.py    CRUD + semantic search; lazy embedder
├── session_service.py   Session + message lifecycle
├── context_service.py   Memory search + recent messages → prompt context
└── api/
    ├── common.py        Pydantic request/response models, ApiResponse envelope
    ├── memories.py      memory routes + users-router for /v1/users/{id}/memories
    ├── sessions.py      session routes
    └── context.py       context route

Behavior notes

  • Create memory → INSERT into postgres, then embed + UPSERT into Qdrant. Embedding happens outside the DB transaction so a slow embedder doesn't hold a row lock.
  • Search → embed query, Qdrant query_points with user_id/agent_id/memory_type filters, then fetch full rows from postgres in one IN-clause. Search bumps access_count and accessed_at on the returned rows.
  • Update content → re-embeds and UPSERTs (Qdrant point id == memory id).
  • Delete → removes the postgres row, then deletes the Qdrant point.

Running tests

.venv/bin/python -m pytest

The test suite uses mocks (no postgres or qdrant required) and covers all 13 routes.

For end-to-end verification, the smoke commands above exercise the whole stack against a live DB + Qdrant.


Platform notes

macOS x86_64

torch is capped at 2.2.2 on macOS x86_64 (Apple deprecated x86 wheels for newer torch). pyproject.toml therefore pins numpy<2 and transformers<5 to stay ABI-compatible with that torch. If you are on Apple Silicon (arm64) or Linux, those pins are still safe — just not strictly required.

If you'd rather avoid the torch dependency entirely, set:

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

The embedder is loaded lazily, so importing the app no longer triggers a model download.

Container engines

docker-compose.yml works under both Docker Desktop and podman compose. With plain podman run, see the Quick start above for the equivalent commands.


Drop-in mode for mypalclara

MyPalace is a strict superset of mypalclara's embedded Palace surface since end of phase 3. To swap mypalclara from embedded to remote MyPalace, follow the step-by-step guide at docs/migrating-mypalclara.md.

The short version:

# 1. install the client
pip install mypalace-client==0.7.1   # or "mypalace-client[grpc]"

# 2. copy examples/mypalclara_router.py into mypalclara as
#    mypalclara/core/memory/routed.py and swap the imports

# 3. set the env and you're done
export USE_PALACE_SERVICE=true
export PALACE_SERVICE_URL=http://mypalace:8000
export PALACE_API_KEY=pk_live_...   # mint via /v1/admin/keys

The router uses explicit pass-throughs — every public method of ClaraMemory + MemoryManager has its own entry, no __getattr__ fallthrough. mypalclara's existing Discord-transcript replay script writes through the router to remote MyPalace with no Palace-specific port — see the migration guide for replay + validation + rollback steps.

Legacy slice notes (kept for historical reference)

Six more endpoints that mypalclara's episode_store.* and MM.reflect_on_session / MM.run_narrative_synthesis callers can route to remote Palace:

  • POST /v1/reflection/session?mode={sync,async} — extract episodes from a message list. Default async returns 202 + job id; sync returns the extracted episodes inline.
  • POST /v1/synthesis/narratives?mode={sync,async} — roll recent episodes into narrative arcs.
  • POST /v1/episodes/search — semantic search over episodes with min_significance filter.
  • GET /v1/users/{user_id}/episodes/recent?limit=5 — recent episodes, newest first.
  • GET /v1/users/{user_id}/arcs/active?limit=10 — active narrative arcs.
  • GET /v1/jobs/{job_id} — poll async job status (pending / completed / failed) and retrieve the result.

Storage: episodes in a separate palace_episodes Qdrant collection; arcs in a narrative_arcs Postgres table with a JSONB key_episode_ids array. Async mode uses pure asyncio (no Celery/arq); jobs in flight don't survive process restarts (caller can re-POST).

LLM extraction uses palace/llm.py's OpenAI-compatible chat-completion client (works against OpenRouter, OpenAI, Anthropic-via-OpenRouter, or any compatible endpoint). Set LLM_API_KEY in .env.

The examples/mypalclara_router.py reference now routes episode_store, reflect_on_session, and run_narrative_synthesis when USE_PALACE_SERVICE=true, falling back to embedded otherwise.

Slice 3 additions: FSRS dynamics

Five more endpoints port mypalclara's FSRS-6 spaced-repetition memory dynamics (per-memory stability/difficulty/retrievability state, access logging, and composite ranking) to remote Palace:

  • POST /v1/memories/{id}/promote — apply an FSRS review (default grade=3 / GOOD, signal_type="used_in_response"). Auto-creates the dynamics row on first call.
  • POST /v1/memories/{id}/demote — failure signal (equivalent to promote(grade=1)); default reason="user_correction".
  • GET /v1/memories/{id}/dynamics?user_id=u1 — read the current FSRS state. 404 if no dynamics row.
  • POST /v1/memories/{id}/score — composite ranking breakdown given the caller's semantic score. Returns {composite_score, fsrs_score, retrievability, storage_strength}. Composite formula: composite = 0.6 * semantic + 0.4 * fsrs_score, where fsrs_score = (0.7 * retrievability + 0.3 * storage_strength) * importance_weight.
  • POST /v1/maintenance/prune-access-logs?retention_days=90 — admin op; deletes old access log rows.

The FSRS-6 math (palace/dynamics/fsrs.py) is ported character-for-character from mypalclara, with a deterministic regression net (tests/test_dynamics.py) pinning known input -> output values to catch porting drift.

MM.get_last_retrieved_memory_ids stays embedded (slice-3 design D4): the HTTP service is stateless between requests, so an in-process cache wouldn't survive across worker processes. The mypalclara router caches retrieved IDs client-side instead.

Slice 4 additions: intentions

Six endpoints port mypalclara's intentions subsystem — deterministic-trigger reminders that fire when matching keywords, topics, times, or context conditions are detected. No LLM in this slice; all matching is purely structural (keyword regex, time comparison, context dict matching, word-overlap fallback for topic).

  • POST /v1/intentions — set a new intention ({user_id, content, trigger_conditions, ...}).
  • POST /v1/intentions/check — evaluate all unfired intentions for a user against a message + context. Returns the fired list (sorted by priority); marks fired and deletes any with fire_once=true.
  • POST /v1/intentions/format — render fired intentions as a markdown bullet list for system-prompt injection.
  • DELETE /v1/intentions/{id} — delete a single intention. 404 if not found.
  • GET /v1/users/{user_id}/intentions?fired={true|false|all}&limit=50 — list intentions for a user.
  • POST /v1/maintenance/cleanup-intentions — admin op; deletes intentions whose expires_at has passed.

Four trigger types (set via trigger_conditions["type"]):

  • keyword — substring match against keywords; optional regex and case_sensitive.
  • topic — word-overlap fraction vs. topic >= threshold; optional quick_keywords pre-filter.
  • time — fires when current UTC time has reached at (specific) or after (open-ended).
  • context — matches a dict of {channel_name, is_dm, time_of_day, day_of_week}; all configured keys must match.

The mypalclara_router.py reference now routes MM.set_intention, MM.check_intentions, and MM.format_intentions_for_prompt when USE_PALACE_SERVICE=true.

Slice 5 additions: layered retrieval + smart ingestion

Three endpoints + an activated flag close out phase 2: multi-tier context assembly, LLM-driven memory extraction with vector dedup + heuristic supersede, and a manual supersede record:

  • POST /v1/context/layered — multi-tier context assembly. Parallel-fetches L1 (top semantic memories + recent episodes + active arcs) and L2 (query-filtered memories optionally FSRS-reranked + query-filtered episodes), char-budgets each layer, and optionally pulls recent_messages from a session. Returns a structured dict (caller composes into prompts; Palace stays generic).
  • POST /v1/memories/{id}/supersede — manually replace a memory. Creates a new memory and an audit row; demotes the old memory's FSRS state.
  • GET /v1/memories/{id}/supersedes — supersession history involving this memory id (either side).
  • Activated: infer=true on POST /v1/memories/batch now runs the smart-ingestion pipeline: LLM extracts factual memories from the conversation block; for each candidate, embed + Qdrant-search the nearest existing memory; score > 0.95 skip as duplicate, score > 0.75 heuristic contradiction check (auto-supersede when confidence > 0.7, else skip as similar), else write fresh. Response meta carries supersessions and skipped arrays.

The contradiction heuristic is intentionally simple (no LLM in the hot ingestion path): negation-asymmetry plus stemmed token overlap. The LLM extraction is the only LLM call.

MM.build_prompt_layered, MM.smart_ingest, and MM.supersede_memory graduate to remote in examples/mypalclara_router.py when USE_PALACE_SERVICE=true. Note: the routed build_prompt_layered returns a structured LayeredContext dict instead of typed Messages — mypalclara's caller must adapt (the routed path drops Discord-specific persona/channel layers per phase-2 design D1).

Integration tests

Default pytest runs only the fast mock-based suite (~2s). Live end-to-end tests against real postgres + qdrant are opt-in:

pytest                       # mocks only — fast iteration
pytest -m integration        # live containers via testcontainers-python

The integration suite spins up postgres + qdrant containers per session (via testcontainers-python), truncates tables between tests, and exercises:

  • Memory CRUD, semantic search ranking, batch create + filtered list, delete-all with filters (tests/integration/test_memories_live.py).
  • Session/message lifecycle and context assembly (tests/integration/test_sessions_live.py).
  • palace_client against a live Palace via in-process ASGI transport, proving wire-contract agreement (tests/integration/test_client_e2e.py).

Requires Docker or podman with a running machine. First run pulls postgres + qdrant images and downloads the small embedding model (sentence-transformers/all-MiniLM-L6-v2); subsequent runs are 30-60s.


Bulk import / export (phase 6 slice 2)

Disaster recovery + tenant migration. Streaming NDJSON, one record per line.

# Stream a tenant dump to a file
curl "http://localhost:8000/v1/admin/export?tenant_id=acme" \
  -H "X-Palace-Key: $ADMIN_KEY" \
  -o palace-acme-export.ndjson

# Re-import into a target tenant (creates the tenant if missing)
curl -X POST "http://localhost:8000/v1/admin/import?tenant_id=acme-restored" \
  -H "X-Palace-Key: $ADMIN_KEY" \
  --data-binary @palace-acme-export.ndjson

Each line is one row prefixed by _type (one of tenant, memory, session, narrative_arc, intention, memory_dynamics, memory_supersession). Vector data is not included — re-embed on import keeps dumps portable across embedding models. The tenant_id query param always wins over any tenant_id field in the dump. Idempotent on primary keys via db.merge(). api_keys are excluded — set up auth on the new deployment separately.

Pair with /v1/admin/reembed (slice 4) by passing ?reembed=false for very large imports, then triggering re-embed afterward.


Memory TTL (phase 6 slice 3)

Optional time-to-live on memories. Pass ttl_seconds on create:

curl -X POST http://localhost:8000/v1/memories \
  -H "X-Palace-Key: $KEY" \
  -d '{"user_id":"u1","content":"login code: 482719","ttl_seconds":300,"memory_type":"session"}'

The memory's expires_at is set to now + ttl. Search/list/get already exclude expired rows even before cleanup runs (WHERE expires_at IS NULL OR expires_at > now()).

Garbage collection runs as a worker handler. Enqueue per-tenant:

from mypalace.workers import enqueue
await enqueue(kind="cleanup", user_id="system",
              payload={"batch_size": 500}, tenant_id="acme")

Or schedule it via cron / your orchestrator. Operators without a worker process can still delete expired memories via the regular DELETE endpoint — the index excludes them from reads regardless.


Releasing (operator notes)

Tagging vX.Y.Z triggers .github/workflows/release.yml which runs tests, builds both packages, and (when configured) publishes to PyPI + Docker Hub. Tags ending in -rc* or -beta* route to TestPyPI for rehearsal.

One-time setup

1. PyPI trusted publishing. For each project at https://pypi.org/manage/account/publishing/, add a "GitHub" publisher pointing at this repo + workflow release.yml. Leave the environment field empty. Do this for both:

  • mypalace
  • mypalace-client

Once configured, the pypa/gh-action-pypi-publish@release/v1 step OIDC-authenticates and uploads with no API token to manage. Repeat the same for https://test.pypi.org if you want rc/beta rehearsals to publish.

2. Docker Hub (optional). If you want Docker images built on every tag, configure three repo settings (Settings → Secrets and variables → Actions):

  • Variable PUBLISH_DOCKER = true
  • Secret DOCKERHUB_USERNAME = your Docker Hub username
  • Secret DOCKERHUB_TOKEN = a Docker Hub access token

Without vars.PUBLISH_DOCKER=true, the docker job is skipped and the GitHub release still cuts.

Cutting a release

# Rehearse on TestPyPI first
git tag -a v0.5.0-rc1 -m "rehearsal"
git push origin v0.5.0-rc1
# Watch https://github.com/BangRocket/mypalace/actions

# Once green, cut the real tag
git tag -a v0.5.0 -m "0.5.0 — see CHANGELOG.md"
git push origin v0.5.0

If a tag's workflow fails, fix the issue, delete the tag locally and remote (git tag -d v0.5.0 && git push --delete origin v0.5.0), then re-tag.


License

PolyForm Noncommercial 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mypalace-0.8.1.tar.gz (194.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mypalace-0.8.1-py3-none-any.whl (153.3 kB view details)

Uploaded Python 3

File details

Details for the file mypalace-0.8.1.tar.gz.

File metadata

  • Download URL: mypalace-0.8.1.tar.gz
  • Upload date:
  • Size: 194.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mypalace-0.8.1.tar.gz
Algorithm Hash digest
SHA256 a120fd7e79d4b78f1a05d52c46266663c785073d6c1cc870b4e74f369b8518a5
MD5 2d4aaf7ad250d26b62360688e877dd8b
BLAKE2b-256 f9697024e77fbbd1ef428c4663bf89c5559be53765a9ab4c49e42d3e9175aa03

See more details on using hashes here.

Provenance

The following attestation bundles were made for mypalace-0.8.1.tar.gz:

Publisher: release.yml on BangRocket/MyPalace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mypalace-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: mypalace-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 153.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mypalace-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea2f8089cb8230e384cd82c5d2a95bc79ca6e5ae0ac295d53024d5e9b55db71d
MD5 d903f1bd0bdb8da808372624c38fe0ba
BLAKE2b-256 275f378eb09e2e058af625c7d2403dad7eadf53bf78bbad5bb4bc3117cc8e0f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for mypalace-0.8.1-py3-none-any.whl:

Publisher: release.yml on BangRocket/MyPalace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page