Incrementally convert documents to knowledge organized in wiki

These details have not been verified by PyPI

Project description

cocoindex-wiki

ccwiki incrementally converts a collection of markdown documents into a structured, interconnected wiki using LLMs. It plans categories, extracts and deduplicates entities, produces curated knowledge entries, and keeps the wiki in sync with the source documents as they change — powered by CocoIndex for incremental processing and function memoization.

Use it from your coding agent via the included Claude Code skill, or from the CLI directly.

Inspired by Andrej Karpathy's note on wiki-style knowledge organization.

What it does

Given a directory of markdown files, ccwiki:

Plans categories for the wiki based on the source documents.
Extracts canonical entities (people, systems, events, concepts, ...) from each document.
Resolves duplicates across documents (e.g., "Alice Chen" and "Alice" → one canonical entry) using embedding similarity plus LLM confirmation.
Writes per-entity wiki entries by combining knowledge from every source that mentioned the entity, with markdown cross-references and source footnotes.
Keeps everything incremental — edit a source doc and only affected entries get rebuilt.

The output is a set of markdown files organized by category, suitable for browsing in any markdown renderer (Obsidian, VS Code, GitHub, etc.).

Get Started

Install

Using pipx:

pipx install cocoindex-wiki          # first install
pipx upgrade cocoindex-wiki          # upgrade

Using uv:

uv tool install --upgrade cocoindex-wiki

Requires Python 3.11+. After installation, the ccwiki command is available globally.

Configure the LLM

Set your primary model and the matching provider API key:

export CCWIKI_LLM_MODEL="anthropic/claude-haiku-4-5-20251001"
export ANTHROPIC_API_KEY="..."

# Optional: a lighter/cheaper model for entity extraction and resolution
export CCWIKI_LLM_MODEL_LITE="anthropic/claude-haiku-4-5-20251001"

CCWIKI_LLM_MODEL accepts any LiteLLM-compatible model name (OpenAI, Anthropic, Google, and others). See tests/e2e/run_test.sh for working examples with Gemini, OpenAI, and Anthropic.

Coding Agent Integration

Skill (Recommended)

Install the ccwiki skill so your coding agent can set up and build your wiki interactively — it handles installation, category design, config writing, and indexing on its own:

npx skills add cocoindex-io/cocoindex-wiki

The skill teaches the agent to:

Install cocoindex-wiki and verify CCWIKI_LLM_MODEL + the matching provider API key are set.
Read your source docs first and propose category sets with explicit granularity rules.
Write @WIKI.md and each @WIKI_CATEGORY.md to disk, then iterate with you on boundaries and descriptions.
Run ccwiki index to build the wiki, show the result, and re-run incrementally as you edit sources or categories.

Just ask your agent something like "help me set up a ccwiki for this folder of notes" or "add a new category for research papers", or type /ccwiki to invoke the skill directly. Works with Claude Code and other skill-compatible agents.

Manual CLI Usage

You can also use the CLI directly — useful for scripted pipelines, CI jobs, or when you want full control without an agent in the loop.

1. Initialize a project

In the directory containing your markdown source files:

ccwiki init

This creates a @WIKI.md at the project root with default settings. Edit it to describe your project's purpose and adjust include_patterns / exclude_patterns / output_dir as needed.

2. Plan the wiki categories

ccwiki plan

This reads your source documents and asks the LLM to propose a category schema, then writes a @WIKI_CATEGORY.md in each category subdirectory. For finer control over categories, use the skill instead — it keeps a human in the loop during design.

3. Build the wiki

ccwiki index

This runs the full pipeline: entity extraction → resolution → knowledge extraction → combining → file output. The resulting wiki files appear under wiki/ (or your configured output_dir).

On subsequent runs, only affected entries are rebuilt (thanks to CocoIndex memoization) — edits to one source doc typically touch only the entries that mentioned the changed entities.

Verbose logging

Add -v to see entity extraction and deduplication logs:

ccwiki -v index

Project layout

A typical ccwiki project:

my-project/
├── @WIKI.md                       # Project-level config
├── docs/                          # Source markdown files
│   ├── overview.md
│   ├── team.md
│   └── ...
└── wiki/                          # Generated wiki (the output)
    ├── People/
    │   ├── @WIKI_CATEGORY.md      # Category-level config
    │   ├── Alice Chen.md
    │   └── Bob Martinez.md
    ├── Products/
    │   ├── @WIKI_CATEGORY.md
    │   └── ...
    └── ...

@WIKI.md describes the project. Each @WIKI_CATEGORY.md defines what kinds of entries belong in that category and how to write them. See skills/ccwiki/references/file_formats.md for the full spec.

How it works

ccwiki runs a multi-phase pipeline on CocoIndex v1:

Phase 1 — Entity extraction (per source document, parallel). An LLM reads each raw doc and extracts canonical entities, classified by category. Entity names are sanitized for filesystem safety immediately after extraction.
Phase 2.1 — Entity resolution (per category, parallel). For each category, entities are embedded with SentenceTransformer, similar ones are found via FAISS, and an LLM confirms or rejects matches. A stability rule prefers entities that already have an existing .md file, so canonical choices stay consistent across runs.
Phase 2.2 — Knowledge extraction and combining (per canonical entity). For each entity, knowledge is extracted from each contributing source doc, then combined into a single entry via a stable K-ary combining tree. The tree structure is determined by stable file-path fingerprints, so when one source doc changes only log(N)/log(K) combining calls re-run. Final entries include markdown cross-links between related entities and numbered [^src1] footnotes pointing back to source documents.

All LLM calls are memoized by CocoIndex, so re-running ccwiki index after small edits is fast and cheap. Wiki files are declared as CocoIndex target states, so entities that are no longer canonical (merged or removed) have their files automatically deleted.

Testing

End-to-end test scenarios live under tests/e2e/:

tests/e2e/solar_system/ — small dataset (8 docs about the Solar System)
tests/e2e/helios_labs/ — larger dataset (24 docs about a fictional AI startup)
tests/e2e/arxiv_papers/ — 50 real arXiv papers

Run a scenario against one or all LLM providers:

source ~/.env.llm  # sets OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY
bash tests/e2e/run_test.sh tests/e2e/solar_system              # all providers
bash tests/e2e/run_test.sh tests/e2e/solar_system anthropic    # single provider

Results are archived under archived_results/ in each scenario directory.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2

Apr 17, 2026

0.0.1

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex_wiki-0.0.2.tar.gz (292.3 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cocoindex_wiki-0.0.2-py3-none-any.whl (30.8 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file cocoindex_wiki-0.0.2.tar.gz.

File metadata

Download URL: cocoindex_wiki-0.0.2.tar.gz
Upload date: Apr 17, 2026
Size: 292.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cocoindex_wiki-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`a1262c81d373098959fbea8166d2c7f930d8dfa6e79dde083fdf46bd3f351f37`
MD5	`2301d8545fb5c9d5fe7beef1264c69de`
BLAKE2b-256	`040ea2cb8c5eba88cc06b73a49c58e205daeb30ddab29bed3e733d975095a233`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cocoindex_wiki-0.0.2.tar.gz:

Publisher: release.yml on cocoindex-io/cocoindex-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cocoindex_wiki-0.0.2.tar.gz
- Subject digest: a1262c81d373098959fbea8166d2c7f930d8dfa6e79dde083fdf46bd3f351f37
- Sigstore transparency entry: 1331643353
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: cocoindex-io/cocoindex-wiki@41738e4ac3df2c7a3eb5da8a405e7bfd3409465d
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/cocoindex-io
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41738e4ac3df2c7a3eb5da8a405e7bfd3409465d
- Trigger Event: release

File details

Details for the file cocoindex_wiki-0.0.2-py3-none-any.whl.

File metadata

Download URL: cocoindex_wiki-0.0.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cocoindex_wiki-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c3b9bfa4a9d5d48e8f7f64c8cd9c3c31511c9e7533eb33292916cfbf4e900e4`
MD5	`cfa9459414fa8d9dc28dbf4e5db55518`
BLAKE2b-256	`e8757e7bb287000dfaaaa8d8b267e28224844fa58b34ceb4ea6899d319f24b1c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cocoindex_wiki-0.0.2-py3-none-any.whl:

Publisher: release.yml on cocoindex-io/cocoindex-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cocoindex_wiki-0.0.2-py3-none-any.whl
- Subject digest: 3c3b9bfa4a9d5d48e8f7f64c8cd9c3c31511c9e7533eb33292916cfbf4e900e4
- Sigstore transparency entry: 1331643437
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: cocoindex-io/cocoindex-wiki@41738e4ac3df2c7a3eb5da8a405e7bfd3409465d
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/cocoindex-io
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41738e4ac3df2c7a3eb5da8a405e7bfd3409465d
- Trigger Event: release

cocoindex-wiki 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

cocoindex-wiki

What it does

Get Started

Install

Configure the LLM

Coding Agent Integration

Skill (Recommended)

Manual CLI Usage

1. Initialize a project

2. Plan the wiki categories

3. Build the wiki

Verbose logging

Project layout

How it works

Testing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance