Search index for agent memory — knowledge graph index for your documents.
Project description
Knowledge Index (ki)
Scale memory across agents and docs
Point ki at a folder (or many), sync to graph index, and query from the CLI or any AI agent. Multiple folders and users can share the same index. Source files are never modified, so it's safe on an Obsidian vault, a git repo, or a research folder. Backed by Neo4j.
install ki, wire into Claude Code, ask Claude what's in your notes:
uv tool install knowledge-index
ki configure # one-time: pick Local (Podman), Aura, or an Existing Neo4j — see `references/neo4j-podman.md` for the Local path
ki skill install claude-code # drops the routing rules into ~/.claude/skills/ki/SKILL.md
Then in Claude Code:
❯ index your memory
...
⏺ Bash(ki index /Users/zachblumenfeld/.claude/projects/ 2>&1 | tail -40)
⎿ ✓ Initialized vault at /Users/zachblumenfeld/.claude/projects (id: 85f4be4d…)
Indexed: 13 added, 0 updated, 0 unchanged, 3 sections, 10 links.
⏺ Indexed. 13 docs added across your project memory dirs (3 sections, 10 cross-doc links). Try ki search "..." to query — e.g. ki search
"decision graph" or ki search "naming" --type section.
...
❯ what retrieval stuff have we built lately?
⏺ Bash(ki search "retrieval" --k 10 --json 2>&1)
⎿ [
{
"document_uri": "ce2cc5c8-2968-44fd-8ea8-caca79cfd747/retrieval-queries.md",
… +121 lines (ctrl+o to expand)
Listed 1 directory (ctrl+o to expand)
⏺ Based on the indexed memory, here's what's there on retrieval (current working dir ~/demo/retrieval is empty, so this is from
notes/docs):
ki (Knowledge Index) — the main retrieval project
- retrieval-queries.md — 10 retrieval queries (B.1–B.10) ported from the old Wikipedia-graph queries to the new
User–Vault–Document–Section schema
- requirements.md + ingest-cypher.md — schema, constraints, and the content_search fulltext index over Document|Section|Vault on
displayName + content + aliases + description
- Key design call: fulltext is the v1 retrieval substrate; vector/embeddings deferred. Wikilink aliases ("JFK" / "John F Kennedy")
indexed so alternates hit the same doc.
- Validated approach: what-worked.md notes that section-level retrieval beat whole-document retrieval when dogfooded against the
repo's own docs/.
Other hits (Clojure / internals-markdown notes mentioning "information retrieval") are unrelated tangents — noise from the broader
vault.
Want me to pull the full retrieval-queries doc or a specific section?
That's the whole loop. Read on for the same flow with other agents, direct CLI use, and what's not in this release.
Getting started
With a coding agent (Claude Code, Cursor, Windsurf, …)
Coding agents can shell out to ki directly. ki skill install drops the bundled routing rules (the markdown at skills/ki/SKILL.md) into each agent's well-known config path so the agent knows when to use ki — track / remember / build a knowledge base / search my notes / find related material — and when to skip.
# 1. Install ki on PATH.
uv tool install knowledge-index # if `uv` isn't installed: curl -LsSf https://astral.sh/uv/install.sh | sh
ki --version
# 2. One-time Neo4j connection. Three paths: Local (Podman — see references/neo4j-podman.md),
# Aura (billable cloud), or Existing (point at a Neo4j you already run).
ki configure
# 3. Install the skill into every detected agent — or pick one explicitly.
ki skill install # all detected agents
ki skill install claude-code # one specific agent
ki skill list # what's wired up, what's detected
ki skill remove claude-code # undo
Supported agent catalog
claude-code cursor windsurf copilot gemini-cli
cline codex pi opencode junie
For anything not in the catalog, pass an explicit path:
ki skill install my-fork --path ~/.my-fork/rules/ki.md
Then ask the agent things like:
- "Can you incorporate this folder of notes into your memory?"
- "What did I write about retrieval strategies?"
- "Find the doc where I sketched the schema."
Auto-mode rules (full text in skills/ki/SKILL.md): reversible local actions (ki index, single-doc ki rm) fire without asking; irreversible or billable actions (ki configure → Aura, whole-vault ki rm --vault) pause for explicit consent. Source files are never modified by either ki or the agent.
From the command line (no agent)
If you'd rather drive ki yourself:
uv tool install knowledge-index # install
ki configure # one-time Neo4j connection
ki index ~/Documents/my-vault # sync the folder into the graph (idempotent)
ki search "retrieval" # default: section content (B.2)
ki search "graph" --type document --k 5 # document title (B.1)
ki search "graphs" --type vault --k 5 # cross-vault routing by description (B.11)
ki search "" --type neighbors --doc-uri <uri> # 1-hop link neighbourhood (B.3)
ki vault list # show every indexed vault with its description
ki rm ~/Documents/my-vault/notes/old.md # remove a doc from the index (file untouched)
ki rm ~/Documents/my-vault --vault # remove a whole vault (typed confirmation)
All commands: ki configure | index | search | vault | rm | init | skill. Run any with --help for flags. KI_PROFILE=work ki index ./vault overrides the profile per-invocation. Run uvx knowledge-index --help first if you'd rather not install globally.
Per-vault routing is driven by <vault>/.ki/vault.yaml. ki writes the uri: UUID on first ingest; add an optional description: to give agents a short routing hint about what this vault is for. Quickest way to set it: ki index <vault> --description "..." (or wait for the interactive prompt on the very first ki index). The description flows into Vault.description on each ingest and powers ki search --type vault.
From a chat app (Claude, ChatGPT, Gemini, Copilot — web / desktop)
Not yet supported. Required MCP server. On Roadmap
Roadmap & known limitations
ki v0.1 is intentionally scoped. The items below are not bugs — they're explicit deferrals you should know about before betting on it.
Local Neo4j via Podman — the recommended quick-start
ki configure offers three paths: Local (Podman), Aura (billable cloud), Existing (point at a URI you already have).
Local runs neo4j:latest in a Podman container with the APOC + GenAI plugins enabled, a named volume for persistence, and --restart unless-stopped. The full agent-followable runbook — preflight, bring-up, recovery for the three failure modes (container stopped / removed / volume wiped), and teardown — lives in references/neo4j-podman.md. ki configure → Local shells out to that path automatically; if you'd rather read what it does first, the runbook is the source of truth.
Prerequisite: podman on PATH. On macOS:
brew install podman
podman machine init
podman machine start
On Linux: apt install podman / dnf install podman / etc. (no machine step needed.)
Aura — neo4j-cli aura create provisions a real billable cloud instance. ki configure → Aura walks you through it. See neo4j-labs/neo4j-cli.
Existing — any Neo4j you can reach over Bolt works. Pick this if you already run Neo4j via Docker, a managed service, or anything else; ki configure → Existing just prompts for URI + credentials.
No vector search yet — fulltext only
ki search runs against the content_search fulltext index over Document|Section|Vault.{displayName, content, aliases, description}. There are no vector indexes or embeddings in the graph yet; hybrid (fulltext + vector) is on the v2 list. The genai plugin is already enabled in the Podman setup (see references/neo4j-podman.md) for the upgrade path, so when this lands existing vaults won't need to be re-ingested.
Of the ten queries defined in docs/retrieval-queries.md, three are wired into the CLI today; the rest exist as Cypher but aren't reachable through ki search yet:
| Flag | Query | What it does |
|---|---|---|
--type section (default) |
B.2 | Section content fulltext |
--type document |
B.1 | Document title fulltext |
--type neighbors --doc-uri |
B.3 | 1-hop LINKS_TO neighbourhood |
The remaining seven retrieval shapes — full-document text, frontmatter + section titles, section get-by-URI, ±N section windowing (full and summary), backlinks, shortest path — are tracked at #6.
Markdown (.md) only — convert other formats first
v1 indexes .md files only. For PDFs, docx, HTML, or plaintext, convert to markdown first with pandoc, markitdown, or by reading + transcribing, then run ki index on the output folder. See the PREPARE when section of skills/ki/SKILL.md for the agent-side flow. Native ingest of other formats is on the roadmap.
No MCP server to work with chat apps — only coding agents work today
Coding agents (Claude Code, Cursor, …) run on your machine and can shell out to ki directly. Chat apps (claude.ai, ChatGPT, Gemini, Copilot Web/Desktop) can't — they need an MCP server bridging the chat surface to a local tool. ki doesn't ship one yet. Until then, use a coding agent on the same machine, or paste ki search "..." --json output into the chat manually.
Smaller deferrals
These are unlikely to change soon but worth being explicit about:
- Single-machine ingest, single Neo4j write session. Concurrent writers would deadlock on shared
MERGEtargets and the throughput at v1 scales doesn't justify the complexity — seedocs/requirements_v01_mvp.md§Scalability lever 5. - No
:Foldernode label. Hierarchy lives inDocument.uriand prefix-matches handle subtree queries. - Plaintext passwords in
~/.config/ki/config.yaml(file mode0600). OS keyring integration is the v2 upgrade path. - No
--purge, ever.kiremoves data from the index; source files are always untouched. SeeAGENTS.md§Non-negotiable design principles. - PyPI is the only supported install. No Homebrew formula, no
curl | sh, no standalone binaries.
For everything else on the active roadmap (embeddings, PageRank, MCP server, native non-markdown ingest, …), see the open issues at https://github.com/zach-blumenfeld/knowledge-index/issues.
Development
If you want to hack on ki itself (rather than just use it), the loop is uv sync --extra dev && uv run pytest && uv run ruff check src/ tests/ scripts/.
Setup
git clone https://github.com/zach-blumenfeld/knowledge-index.git
cd knowledge-index
uv sync --extra dev # installs runtime + pytest + ruff
Run tests
# Unit tests only — pure Python, no Neo4j needed.
uv run pytest tests/unit -v
# Full suite. Integration tests auto-skip if no Neo4j is reachable.
uv run pytest tests/ -v
# To actually run integration tests, point them at any Neo4j you have
# (Docker, Aura, or a local install):
KI_TEST_NEO4J_URI=bolt://localhost:7687 \
KI_TEST_NEO4J_USER=neo4j \
KI_TEST_NEO4J_PASSWORD=password \
uv run pytest tests/ -v
The integration suite is destructive — it ingests tests/fixtures/sample_vault/ and DETACH DELETEs vaults on teardown. Don't point it at a Neo4j that holds real data.
Lint
uv run ruff check src/ tests/ scripts/
CI runs the same command on Python 3.11 / 3.12 / 3.13.
Test fixtures
tests/fixtures/sample_vault/ is generated by scripts/gen_test_vault.py. Don't hand-edit — regenerate:
rm -rf tests/fixtures/sample_vault
uv run python scripts/gen_test_vault.py --size tiny --seed 42 \
--output tests/fixtures/sample_vault
Same --seed → byte-identical output across runs. The generator supports tiny / small / medium / large matching the §Scalability envelopes in docs/requirements_v01_mvp.md.
Contributing
Before opening a PR:
- Read
AGENTS.md— design principles, project map, and the Don't list. - Skim
docs/requirements_v01_mvp.md— anything in there is normative. If your change conflicts with the spec, update the spec in the same PR. - If you're changing the schema, update
docs/data-model.mdbefore the code. - If you're changing Cypher, update
docs/ingest-cypher.mdordocs/retrieval-queries.mdbefore the code — those are the source of truth. - If you're changing CLI behavior, keep
docs/requirements_v01_mvp.md,skills/ki/SKILL.md, and the implementation in lockstep — drift between those is the #1 source of agent-routing bugs.
Release flow
The release workflow is manual and lives at .github/workflows/release.yml. To cut a release:
- Bump
version = "..."inpyproject.toml. - Add a
## [X.Y.Z] — YYYY-MM-DDsection toCHANGELOG.md. The heading format is load-bearing — the workflowawk-extracts the release notes by matching it exactly. - Open a PR, merge to
main. - Actions tab → Release to PyPI → Run workflow → branch
main.
The workflow refuses to re-release an existing tag (forces a version bump on re-run), builds, publishes via PyPI Trusted Publishing, creates the git tag, and cuts a GitHub Release with body extracted from CHANGELOG.md.
Learn more
docs/requirements_v01_mvp.md— full design spec (CLI shape, schema, scalability, auto-mode rules)docs/data-model.md— Neo4j schema (nodes, edges, properties)docs/ingest-cypher.md— whatki indexwritesdocs/retrieval-queries.md— whatki searchexposes (B.1–B.10)skills/ki/SKILL.md— agent routing rules (when an agent should invokeki)AGENTS.md— for AI agents (or humans) contributing to the codebaseCHANGELOG.md— release history
License
See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowledge_index-0.4.1.tar.gz.
File metadata
- Download URL: knowledge_index-0.4.1.tar.gz
- Upload date:
- Size: 235.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9f31e9102073d7139d3ef05d27324efb5dbe6a3df81709840f522be3f6dcf98
|
|
| MD5 |
47a5499d1a8e8aa7eb05da5e6912eb5a
|
|
| BLAKE2b-256 |
b39a4489358dfc096fbb1d2e7db12bb3e61cf22b909e2f2ddf08132c03111f39
|
Provenance
The following attestation bundles were made for knowledge_index-0.4.1.tar.gz:
Publisher:
release.yml on zach-blumenfeld/knowledge-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledge_index-0.4.1.tar.gz -
Subject digest:
b9f31e9102073d7139d3ef05d27324efb5dbe6a3df81709840f522be3f6dcf98 - Sigstore transparency entry: 1603538014
- Sigstore integration time:
-
Permalink:
zach-blumenfeld/knowledge-index@df07bd064027a544f1dc0e05a67a7f739f39ad22 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/zach-blumenfeld
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df07bd064027a544f1dc0e05a67a7f739f39ad22 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file knowledge_index-0.4.1-py3-none-any.whl.
File metadata
- Download URL: knowledge_index-0.4.1-py3-none-any.whl
- Upload date:
- Size: 100.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c3d81ed9bace0954205eb7710d5563858ebeeff2b5cf0de1df49114d06e1889
|
|
| MD5 |
02114b066647acf11d8287045879a146
|
|
| BLAKE2b-256 |
50842c5b0de1259cf8fcf226c62a7c9075e6b617059fa06175c56f560cc291b5
|
Provenance
The following attestation bundles were made for knowledge_index-0.4.1-py3-none-any.whl:
Publisher:
release.yml on zach-blumenfeld/knowledge-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledge_index-0.4.1-py3-none-any.whl -
Subject digest:
1c3d81ed9bace0954205eb7710d5563858ebeeff2b5cf0de1df49114d06e1889 - Sigstore transparency entry: 1603539024
- Sigstore integration time:
-
Permalink:
zach-blumenfeld/knowledge-index@df07bd064027a544f1dc0e05a67a7f739f39ad22 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/zach-blumenfeld
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df07bd064027a544f1dc0e05a67a7f739f39ad22 -
Trigger Event:
workflow_dispatch
-
Statement type: