Local-first persistent memory layer for AI agents. MCP server + REST API.

These details have not been verified by PyPI

Project links

Project description

Kioku

Cross-model memory, local-first.

Python 3.11+

Kioku is cross-model memory, local-first. One memory layer shared across Claude, ChatGPT, Cursor, Gemini, Copilot, and any MCP-aware agent, so you stop re-explaining yourself and stop losing useful work. It runs on your own OpenAI, Anthropic, Gemini, or Perplexity key — or a fully local model via Ollama, with no key and no data leaving your machine — and keeps your data on your machine by default. Local. Cross-model. Yours.

Free needs no account. Pro (€9/mo, free during early access) adds the knowledge graph, custom and scheduled agents, and end-to-end-encrypted multi-device sync.

What Kioku is for

Kioku is strongest when you need:

exact replay, not just vague personalization
one source-backed context layer across more than one AI tool
transcript and provenance inspection before trusting a result
local-first ownership instead of vendor lock-in

Best-fit users:

developers
technical founders
researchers
legal and finance professionals
anyone doing high-trust knowledge work across multiple AI tools

What you can do with Kioku

create reusable Context Packs from verified sources — decision-first, cited, and provenance-backed
get reversed/superseded decisions flagged in a Pack (it leads with the current position and marks what changed) instead of silently mixing old and new
import old ChatGPT, Claude, Gemini, and Perplexity history
run Recall against your own Archive
inspect transcripts, provenance, and preserved artifacts
connect Claude Desktop, Cursor, and other MCP clients
save supported browser chats with the extension
use Kioku from Python through the SDK that ships inside kioku-ai
add LangChain memory on top of the same backend
bring your own LLM key (OpenAI, Anthropic, Gemini, Perplexity) or run fully local with Ollama — pick the active provider when you have more than one

For developers and coding agents:

index a local folder or GitHub repo as code memory (kioku index-repo)
sync external connectors such as GitHub issues/PRs (kioku connector sync)
pull task-aware context and recall scoped to what you're working on
record a session handoff and resume it later from any MCP client

Pro capabilities (free tier is fully usable without them):

knowledge graph / Relationship Map across your memories
a synthesized Profile/persona built from your own data
end-to-end-encrypted cross-device sync
digests, proactive nudges, and custom + scheduled agents

How it works

Kioku front-loads the hard work at write time so reads are fast and source-backed:

Ingest — conversations, documents (PDF / DOCX / XLSX / images), code, and browser captures stream in through one provider-agnostic pipeline. The raw export is preserved in an Archive; re-imports are idempotent (content-hashed), so the same history never duplicates.
Extract typed units — each item is broken into small, classified memory units (facts, decisions, preferences, goals, relationships…) and enriched. memory_type is an open, LLM-assigned label, not a fixed enum.
Retrieve with multiple signals — a query is answered by fusing semantic (vector), lexical (BM25/FTS), conversation, and graph signals, then reranking with a cross-encoder — not a single embedding lookup.
Synthesize a Context Pack — a tight, decision-first brief (~300–500 tokens) built only from retrieved sources, with citations and provenance. It distinguishes settled decisions from open deliberation, and flags decisions that were later reversed.
Inject — the Pack (or raw recall) is delivered to your AI tool over MCP, the REST API, or the SDK.

Background specialist agents run on a schedule to keep memory healthy: enrichment/classification, duplicate review, reversal detection (a decision changed over time), the knowledge graph build, a synthesized persona, goal/temporal/learning tracking, and retention cleanup.

Surfaces: a local FastAPI backend (127.0.0.1:8742) + an MCP server (stdio/SSE), a Tauri desktop app, a Chrome extension, a Python SDK (+ LangChain), and an optional Cloudflare worker for accounts, billing, licensing, and encrypted sync. Everything is scoped per user/workspace; content is encryptable at rest and secrets are redacted on ingest.

Choose your starting path

1. Desktop user

If you want to use the app first:

pip install kioku-ai
kioku warmup
kioku serve-http

Then:

Open the desktop app
Import old history first
Open Archive to inspect transcripts and provenance
Use Recall when you need the same code, fix, or decision again

2. MCP user

If you mainly work in Claude Desktop or Cursor:

pip install kioku-ai
kioku serve-http

Then:

Connect Kioku from the desktop app or write the MCP config manually
Restart Claude Desktop, Cursor, or your MCP client
Ask for:
- the same code again
- the prior fix
- the earlier decision with transcript

3. Python / agent user

If you want to use Kioku from code:

pip install kioku-ai
kioku serve-http

Then use the Python SDK:

from kioku_client import KiokuMemory

memory = KiokuMemory()
memory.add("User prefers Python over JavaScript")
results = memory.search("programming preferences")
print(results[0]["content"])  # search() returns flat memory dicts

Installation

End users

Install Kioku:

pip install kioku-ai
kioku warmup
kioku verify

Start the local backend:

kioku serve-http

The backend normally runs on 127.0.0.1:8742.

Optional multimodal extras

If you want richer local document import:

pip install "kioku-ai[multimodal]"

This enables:

structured PDF parsing through Docling Parse
cloud OCR support for screenshots and scanned documents
richer PDF handling without Java or Tesseract system dependencies

Optional code-ingestion extras

To index local folders and GitHub repos as code memory:

pip install "kioku-ai[code]"

This pulls in the tree-sitter parsers used to chunk source code by symbol.

Optional encryption-at-rest extras

To encrypt memory content at rest (AES-256-GCM, passphrase-derived key):

pip install "kioku-ai[encryption]"

Set KIOKU_ENCRYPTION_ENABLED=true and KIOKU_ENCRYPTION_PASSPHRASE=…. Keyword search still works on encrypted installs via a keyed blind-token index — your plaintext terms are never written to disk.

Configure your LLM provider (BYOK)

LLM features (extraction, synthesis, the chatbot, the knowledge graph, and the agents) run on a key you supply — Kioku never ships a shared key. Any one of these turns on the full feature set:

OpenAI, Anthropic (Claude), Gemini, Perplexity — cloud, key-based
Ollama — fully local, no key; point it at your local server (default http://localhost:11434)

Keys come from either:

your environment / .env (e.g. OPENAI_API_KEY=…) — persists across restarts, or
the desktop Settings → API keys screen — applied instantly without a restart

If both are set for the same provider, the Settings key wins. When you have more than one provider configured, you choose the active one (Settings, or POST /api/v1/llm/active); only one is active at a time. On Claude and Gemini the chatbot streams token-by-token just like OpenAI.

Developers running from this repo

git clone https://github.com/kiokuai/kioku
cd kioku

python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
kioku warmup

Run tests:

pytest tests/

Run the desktop app in development mode:

cd desktop
npm install
npm run tauri dev

Developer usage

Python SDK

The Python SDK ships inside kioku-ai.

Install:

pip install kioku-ai

Docs:

kioku_client/README.md

Common operations:

from kioku_client import KiokuMemory

memory = KiokuMemory()

memory.add("Likes dark mode", memory_type="preference", tags=["ui"])
memory.search("theme preference", limit=5)
memory.get_context("coding style")
memory.ask("What editor do I use?")
memory.list(memory_type="preference", limit=20)
memory.health()
memory.stats()

LangChain

Install the LangChain extra:

pip install "kioku-ai[langchain]" langchain-openai

Example:

from kioku_client.langchain import KiokuSessionStore
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

chain = prompt | llm
sessions = KiokuSessionStore()

chain_with_memory = RunnableWithMessageHistory(
    chain,
    sessions.get_history,
    input_messages_key="input",
    history_messages_key="history",
)

response = chain_with_memory.invoke(
    {"input": "I love building with FastAPI"},
    config={"configurable": {"session_id": "my-session"}},
)

MCP

Kioku can expose a local MCP server for tools like Claude Desktop and Cursor.

Typical flow:

pip install kioku-ai
kioku serve-http
Connect Kioku through the desktop app or generated MCP config
Restart the MCP client

Tools exposed to MCP clients include:

remember, recall, list_memories, forget — core memory
get_context, get_context_pack, attach_context_pack — source-backed context
add_conversation, get_persona, get_stats — conversation + profile + stats
remember_code, developer_recall, get_task_context — code and task-aware context
record_handoff, resume_session — session handoff and resume

Extension and imports

Kioku supports:

ChatGPT, Claude, Gemini, and Perplexity import
browser capture for ChatGPT, Claude, Gemini, Copilot, and Perplexity

Best rollout order:

imports first
MCP next
browser capture later

Benchmarks

Kioku currently has two benchmark tracks:

retrieval benchmark:
- benchmarks/search_quality.py
LongMemEval-style benchmark:
- benchmarks/longmemeval/run_benchmark.py

Supporting benchmark assets:

Run them with:

# deterministic (fast, no model/keys) — regression guard
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite search
# representative — uses the real local embedding model (no API key)
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite search --real-embeddings
# LongMemEval needs working model access (see caveat below)
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite longmemeval --longmemeval-mode full --limit 25

Current retrieval benchmark

Latest run on the current tree (2026-06-14), 23-case search-quality suite. Numbers are reported with the real local embedding model (--real-embeddings) since that reflects what users actually get; the suite has some run-to-run variance, so a range across repeated runs is given rather than a single hero figure.

Real embeddings (3 runs):

pass rate: 87–91% (20–21/23)
MRR: 0.84–0.87
average latency: ~210ms
P95 latency: ~160ms

Deterministic-embedding mode (run_suite.py --suite search, the default) is a stable lower bound at 87% / MRR ~0.79, but uses hash-based toy embeddings and shows a one-off ~3s P95 from the first-query model load — it is a regression guard, not a representative score.

Each run writes timestamped JSON to benchmarks/results/ (git-ignored), e.g. benchmark_suite_<ts>.json and search_quality_{real,det}_<ts>.json.

A previous README figure of 95.7% came from an April 2026 snapshot that predates this repository's git history and could not be reproduced on the current dependency set; the pre/post comparison above confirms no recent code regression (the same suite scores identically before and after the latest search changes).

Current LongMemEval sample result

The LongMemEval harness now drives the real product pipeline — add_conversation(extract=True) for ingestion and manager.synthesize() for answering — so the score reflects Kioku's actual extraction + synthesis, not a bespoke benchmark reimplementation. (The old bespoke prompts remain available via --legacy-harness for comparison.)

Latest run (2026-06-15), full mode, 25-question balanced sample, gpt-4o-mini:

overall accuracy: 60% (15/25), no crashes
by category: knowledge-update 70% (7/10), multi-session 50% (5/10), single-session-assistant 60% (3/5)
elapsed: ~900s

What the fidelity work changed (each was a real product fix, verified on category subsets, that also helps everyday /ask answers):

Recall — "remind me about our earlier chat about X" recall roughly doubled vs the old bespoke harness; the root cause was extraction dropping detail, now preserved (adaptive caps + atomic/detail extraction prompt).
Counting / sums — quantity questions ("how many / how much / how long total") now compute and state the explicit total (e.g. "3.5 weeks", "$185") instead of just listing the parts. Summation subset went ~1/7 → 6/7.
Temporal updates — when a fact changes over time, synthesis now sees each memory's date and returns the most-recent value (e.g. an improved 5K time, an updated mortgage pre-approval) instead of an older one.
Distinct-fact preservation — a dedup fix stops the memory layer merging separately-countable facts (a blue bike and a red bike are two memories), which previously collapsed "how many X" answers.

Caveats (read honestly):

The 25-question sample is small and noisy — per-category figures (especially the 5-question single-session bucket) swing run-to-run with gpt-4o-mini non-determinism. Treat this as indicative, not a headline claim.
A rare, intermittent 'int' object is not subscriptable crash has been observed in long (~20+ question) single-process runs; it is caught per-question (the run continues) and did not occur in this run. Set KIOKU_BENCH_TRACEBACK=1 to capture it; the robust fix (per-question subprocess isolation) is tracked as follow-up.

Result files are written to benchmarks/results/ and benchmarks/longmemeval/results/ (both git-ignored).

Repo structure

src/kioku — Python backend (memory, search, graph, MCP, importers, connectors)
desktop — Tauri desktop app
extension — browser extension
kioku_client — Python SDK and LangChain integration
website — marketing site
infra/workers — Cloudflare worker for cloud sync, billing, and licensing
benchmarks — benchmark runners and result artifacts
verify — live-verification harness for external key/runtime checks (npm run verify:live)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Jun 25, 2026

This version

0.2.3

Jun 24, 2026

0.2.2

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kioku_ai-0.2.3.tar.gz (953.9 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kioku_ai-0.2.3-py3-none-any.whl (1.1 MB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file kioku_ai-0.2.3.tar.gz.

File metadata

Download URL: kioku_ai-0.2.3.tar.gz
Upload date: Jun 24, 2026
Size: 953.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for kioku_ai-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`8c6e27db4174d36ea1edbdb69c76f23dfc710440abf1e25d17cc3eb75563c11f`
MD5	`b759b41f63de72d0406d4708fc794f85`
BLAKE2b-256	`722fd761fa2dc91321fdde74df236ce2aa965a57c6c74e8d689ddf17ccdc83ba`

See more details on using hashes here.

File details

Details for the file kioku_ai-0.2.3-py3-none-any.whl.

File metadata

Download URL: kioku_ai-0.2.3-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for kioku_ai-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f348bbf7e1aaf7695cedd9e6632fb98cc5c9e7664c2a5ba2d119db41287cf642`
MD5	`6224885e5529ac4febb7d739dd3bbd07`
BLAKE2b-256	`5baaec5e42f186fdb4962466dc451c9170b083495d83b4c7731adc14707c5e63`

See more details on using hashes here.

kioku-ai 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kioku

What Kioku is for

What you can do with Kioku

How it works

Choose your starting path

1. Desktop user

2. MCP user

3. Python / agent user

Installation

End users

Optional multimodal extras

Optional code-ingestion extras

Optional encryption-at-rest extras

Configure your LLM provider (BYOK)

Developers running from this repo

Developer usage

Python SDK

LangChain

MCP

Extension and imports

Benchmarks

Current retrieval benchmark

Current LongMemEval sample result

Repo structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes