AI-native knowledge engine across the DIKW pyramid (Data → Information → Knowledge → Wisdom)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

helebest

These details have not been verified by PyPI

Project description

dikw-core

AI-native knowledge engine that turns your documents into Data → Information → Knowledge → Wisdom.

Inspired by Karpathy's LLM Wiki pattern, extended end-to-end across the full DIKW pyramid. Where Karpathy's pattern stops at a compounding markdown knowledge base (the K layer), dikw-core adds a first-class Wisdom layer for human-authored principles, lessons, and patterns that apply beyond any single source.

Status: alpha. Under active construction; APIs, on-disk formats, database schema, and CLI will change.

What you get

A local-first knowledge base — the dikw base — where the on-disk layout is a plain markdown tree your editor (Obsidian, VS Code, …) can open directly.
Four explicit DIKW layers with their own operations:
- Data — raw sources you curate.
- Information — parsed, chunked, embedded, indexed (FTS5 + vectors).
- Knowledge — LLM-authored knowledge pages with [[wikilinks]], index.md, and an append-only log.md.
- Wisdom — hand-written markdown principles / lessons / patterns authored under wisdom/<author>/, indexed (chunked + embedded) so they surface in retrieve alongside K-layer pages.
Pluggable LLM providers (API-first): Anthropic + OpenAI-compatible (covers OpenAI, Azure, Ollama, DeepSeek, Gemini-compat).
Pluggable storage: SQLite+sqlite-vec (default), Postgres+pgvector (enterprise) — swap by config.
Client / server architecture. A long-lived dikw serve (FastAPI + NDJSON) hosts the engine; the dikw client … Typer CLI talks to it over HTTP, streams progress events for long ops, and supports cancel / resume.

Install & quick start

Requires Python 3.12+ and uv.

git clone https://github.com/OpenDIKW/dikw-core
cd dikw-core
uv sync

uv run dikw init my-base --description "my research base"
cd my-base
# Drop some markdown into sources/, then run any single command via
# `dikw client serve-and-run` — it spawns a local server, runs the
# inner command, and tears it down.
uv run dikw client serve-and-run -- ingest --no-embed
uv run dikw client serve-and-run -- retrieve "What does Karpathy mean by deterministic scoping?"

For interactive sessions or long iterations, run dikw serve once and keep using dikw client * against it:

uv run dikw serve --base .   # in one terminal
# in another:
uv run dikw client status
uv run dikw client synth               # K layer (needs ANTHROPIC_API_KEY or OpenAI-compat)
uv run dikw client retrieve "What does Karpathy mean by deterministic scoping?"

Every HTTP-bound command is spelled out as dikw client <verb>; there are no top-level short aliases. dikw-core no longer ships an in-engine answer-synthesis path — retrieve returns ranked chunks + page refs and the agent (Claude Code, ChatGPT, your own script) feeds them into its own LLM. See GUIDE_FOR_AGENTS.md.

Server deployment, security posture, and the wire contract live in docs/server.md. For container deployment, see examples/docker/ (Dockerfile + compose stack with pgvector/pgvector:0.8.2-pg18) and the long-form docs/deployment-docker.md.

End-to-end walkthrough: docs/getting-started.md. Architecture brief: docs/architecture.md. Approved design doc: docs/design.md.

Commands

Local-only commands run in this process:

command	does
`dikw version`	print the package version
`dikw init <path>`	scaffold a dikw base (sources / knowledge / wisdom / `.dikw/` + `dikw.yml`)
`dikw serve --base <path>`	start the FastAPI + NDJSON server bound to one base

Everything else lives under dikw client * and talks to a running server. There are no top-level short aliases — spelling out the client prefix keeps the local-vs-HTTP boundary unambiguous for both agents and humans:

command	does
`dikw client status`	counts across DIKW layers
`dikw client info`	raw `GET /v1/info` passthrough — version, storage backend, auth posture
`dikw client health`	server self-description (base, version, storage, providers) — the first call an agent makes
`dikw client check`	ping the configured LLM + embedding endpoints to verify `dikw.yml` + keys
`dikw client import <path>`	pre-flight + import local md packages (md + referenced assets) into the server's `sources/`
`dikw client ingest [--no-embed]`	parse + chunk + FTS-index + embed the server's `sources/` tree
`dikw client retrieve "<q>"`	hybrid search returning ranked chunks + page refs (no LLM call); agent supplies its own synthesis
`dikw client synth [--all]`	LLM turns source docs into K-layer knowledge pages; maintains `index.md`+`log.md`
`dikw client lint [propose\|proposals\|apply]`	report broken wikilinks / orphan pages / duplicate titles; propose + apply structured fixes
`dikw client pages {list,get,links,provenance}`	enumerate pages / read a page body + chunk anchors / walk the K-layer link graph / walk the K↔D provenance edge
`dikw client graph get`	fetch the whole base graph (nodes + edges + unresolved wikilinks) in one read
`dikw client assets get <id> --output <file>`	download a content-addressed asset by sha256 id
`dikw client eval [--dataset]`	run retrieval-quality evaluation against packaged or custom datasets
`dikw client tasks {list,status,events,wait,cancel}`	inspect running / past async tasks on the server
`dikw client serve-and-run -- <cmd>`	one-shot server + inner command + teardown (no long-lived `dikw serve` needed)

The dikw auth {login,import,status,list,logout} subgroup is local — it manages OAuth tokens in <base>/.dikw/auth.json without talking to a server (used by the openai_codex provider; see docs/providers.md).

Providers

Configured via dikw.yml:

provider:
  llm: anthropic_compat         # or: openai_compat
  llm_model: claude-sonnet-4-6
  llm_base_url: null            # set for any Anthropic-protocol-compatible endpoint
  embedding: openai_compat
  embedding_model: text-embedding-3-small
  embedding_base_url: https://api.openai.com/v1
  embedding_dim: 1536           # required: must match what the endpoint returns
  embedding_revision: ""        # bump to force re-embed when vendor refreshes weights silently
  embedding_normalize: true
  embedding_distance: cosine

llm names a wire protocol (which SDK to speak), not a vendor — the actual vendor is whatever llm_base_url points at.

anthropic_compat → uses the anthropic async SDK with cache_control on the system prompt, so repeated synth calls hit the prompt cache. Set llm_base_url to retarget the SDK at any Anthropic-protocol-compatible endpoint (e.g., MiniMax's https://api.minimaxi.com/anthropic); leave null for api.anthropic.com.
openai_compat → uses the openai async SDK against any base URL that speaks the OpenAI HTTP surface (Azure, Ollama, vLLM, DeepSeek, MiniMax, …).

Full vendor cookbook (MiniMax, GLM, Gemini, DeepSeek, Gitee AI, Ollama, …) and the production gotchas around batch size, embedding dimensions, and retry/caching live in docs/providers.md.

Using MiniMax LLM + Gitee AI embeddings

MiniMax has no embeddings endpoint — pair its Anthropic-compatible LLM surface with an OpenAI-compatible embedding vendor. The example below uses Gitee AI (Qwen3-Embedding-0.6B, 1024 native — the recommended default; swap in Qwen3-Embedding-8B with embedding_dim: 1024 matryoshka or 4096 native for higher-cost / marginal-quality runs). Fill the URLs in by hand — dikw-core never auto-detects vendor endpoints:

provider:
  llm: anthropic_compat
  llm_model: <MiniMax Anthropic-compatible model name>
  llm_base_url: https://api.minimaxi.com/anthropic
  embedding: openai_compat
  embedding_model: Qwen3-Embedding-0.6B
  embedding_base_url: https://ai.gitee.com/v1
  embedding_dim: 1024               # 0.6B native; locked at first ingest
  embedding_revision: ""            # bump to force re-embed when Qwen weights drift silently
  embedding_normalize: true
  embedding_distance: cosine
  embedding_batch_size: 16          # required: Gitee rejects batches >25
  embedding_provider_label: gitee-ai  # optional; shows up in `dikw client check`

A working reference copy lives at tests/fixtures/live-minimax-gitee.dikw.yml — drop it into a fresh base and fill in your two keys.

Two keys for two vendors — the embedding leg reads DIKW_EMBEDDING_API_KEY exclusively (no OPENAI_API_KEY fallback), so misconfigurations fail loudly rather than cross-wiring credentials:

export ANTHROPIC_API_KEY=<your-MiniMax-key>
export DIKW_EMBEDDING_API_KEY=<your-Gitee-key>

Verify connectivity before running ingest/synth. The two legs can be probed separately, which is useful when you set up one vendor first:

uv run dikw client check --llm-only     # just LLM — useful before Gitee is wired up
uv run dikw client check --embed-only   # just embedding
uv run dikw client check                # both

dikw client check pings each provider with one tiny request and prints a status table with endpoint, latency, and dim/tokens. Exit code is 0 on success, 1 on failure, 2 on flag misuse — scriptable in CI or a shell one-liner.

Source formats

Markdown ships out of the box. A new format is one SourceBackend subclass + a register() call away — see domains/data/backends/markdown.py for the reference impl.

Storage

Two backends ship, selected in dikw.yml:

storage:
  backend: sqlite          # sqlite | postgres

  # --- sqlite (default): single-user local ---
  path: .dikw/index.sqlite

  # --- postgres (enterprise): multi-user, pgvector + tsvector ---
  # backend: postgres
  # dsn: postgresql://user:pw@host:5432/dikw
  # schema: dikw
  # pool_size: 10

SQLite + sqlite-vec + FTS5 — the default. No extras required.
Postgres + pgvector — install via uv pip install dikw-core[postgres]. Requires the pg_trgm and vector extensions (standard on the pgvector/pgvector:0.8.2-pg18 Docker image). The adapter uses tsvector+GIN for FTS and vector(N) for embeddings; the vector dimension is set at first insert.

Engine code talks only to the Storage Protocol (storage/base.py); each adapter implements the same contract and is swappable by changing dikw.yml.

Releasing

Tagged pushes (vX.Y.Z) trigger .github/workflows/release.yml, which builds sdist + wheel, re-runs the full test gate, and publishes to PyPI via trusted publishing (no token in repo secrets). One-time setup on PyPI's side:

Create the dikw-core project on PyPI.
On the project's Publishing page, add a GitHub trusted publisher with:
- owner: OpenDIKW
- repository: dikw-core
- workflow: release.yml
- environment: pypi

After that, git tag vX.Y.Z && git push --tags is enough. The release workflow also opens a chore(docker): bump DIKW_VERSION to vX.Y.Z PR against main after a successful PyPI publish, keeping examples/docker/Dockerfile in lockstep with the latest published wheel; merge that chore PR to clear the post-release queue. The dockerfile-version-guard job in reusable-ci.yml enforces the invariant on every PR.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

helebest

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

Jun 23, 2026

0.6.0

Jun 21, 2026

0.5.3

Jun 16, 2026

0.5.2

Jun 13, 2026

0.5.1

Jun 3, 2026

0.5.0

Jun 2, 2026

0.4.7

May 31, 2026

This version

0.4.6

May 30, 2026

0.4.5

May 29, 2026

0.4.0

May 28, 2026

0.3.6

May 27, 2026

0.3.5

May 26, 2026

0.3.0

May 26, 2026

0.2.7

May 23, 2026

0.2.6

May 23, 2026

0.2.5

May 21, 2026

0.2.0

May 20, 2026

0.1.0

May 18, 2026

0.0.2

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dikw_core-0.4.6.tar.gz (1.8 MB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dikw_core-0.4.6-py3-none-any.whl (1.3 MB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file dikw_core-0.4.6.tar.gz.

File metadata

Download URL: dikw_core-0.4.6.tar.gz
Upload date: May 30, 2026
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for dikw_core-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`4f172321b9122dc91f1cf58784a789458e51b763c6bb11fa863481d6f8300b8e`
MD5	`422e60cbdb55348aacb2c4e1ea48c829`
BLAKE2b-256	`4e86f31ed40993a585702c0e6e437dc75551a55177a972559842a2099bc4707b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dikw_core-0.4.6.tar.gz:

Publisher: release.yml on OpenDIKW/dikw-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dikw_core-0.4.6.tar.gz
- Subject digest: 4f172321b9122dc91f1cf58784a789458e51b763c6bb11fa863481d6f8300b8e
- Sigstore transparency entry: 1675632449
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: OpenDIKW/dikw-core@6cce7c9f6cb26ff958711912c6dd2323cc079a79
- Branch / Tag: refs/tags/v0.4.6
- Owner: https://github.com/OpenDIKW
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6cce7c9f6cb26ff958711912c6dd2323cc079a79
- Trigger Event: push

File details

Details for the file dikw_core-0.4.6-py3-none-any.whl.

File metadata

Download URL: dikw_core-0.4.6-py3-none-any.whl
Upload date: May 30, 2026
Size: 1.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for dikw_core-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69f1cfea784bd8cbe91828406a85ab92be1d1694731354d31414ca2541beeb85`
MD5	`5d1dae9d2ee30782c0280bb5fc6d575f`
BLAKE2b-256	`c5ebaac93192d8026497900ef107fd7327db67c4aa10132fb0d856ba77eba260`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dikw_core-0.4.6-py3-none-any.whl:

Publisher: release.yml on OpenDIKW/dikw-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dikw_core-0.4.6-py3-none-any.whl
- Subject digest: 69f1cfea784bd8cbe91828406a85ab92be1d1694731354d31414ca2541beeb85
- Sigstore transparency entry: 1675632493
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: OpenDIKW/dikw-core@6cce7c9f6cb26ff958711912c6dd2323cc079a79
- Branch / Tag: refs/tags/v0.4.6
- Owner: https://github.com/OpenDIKW
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6cce7c9f6cb26ff958711912c6dd2323cc079a79
- Trigger Event: push

dikw-core 0.4.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

dikw-core

What you get

Install & quick start

Commands

Providers

Using MiniMax LLM + Gitee AI embeddings

Source formats

Storage

Releasing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance