Git-native structured knowledge modules for LLM agent workflows via MCP.

These details have not been verified by PyPI

Project description

Knowledge Manager

Turn messy notes into structured modules that LLMs can load on demand via MCP.

Knowledge Manager is a git-native knowledge system for agentic workflows. It stores knowledge as inspectable JSON modules, maintains a lightweight index, and serves both through a CLI and MCP server.

Who It’s For: Teams running AI-assisted engineering workflows who want reusable, reviewable project knowledge without operating a full RAG stack.

Structured modules, not chunks. Preserve intent with explicit sections (overview, details, examples, references, caveats).
Git-native JSON storage. Plain files, atomic writes, and easy review in pull requests.
MCP-ready retrieval. Expose index + module loading tools so clients can choose what to read at runtime.

Why this approach?

For small-to-medium knowledge bases (<= 1M words), structured modules are often simpler to operate than embedding-heavy pipelines.

Approach	Strength	Tradeoff
Knowledge Manager	Human-readable modules + deterministic file storage	Requires a review step during ingest
Classic RAG	Strong semantic recall at larger scale	More moving parts (chunking, embeddings, re-indexing)

Quick Start

1. Initialize a knowledge base

km init ./my_kb

2. Configure your provider

km --kb-path ./my_kb config set llm_providers.deepseek.api_key "sk-..."

3. Extract from notes

km --kb-path ./my_kb add notes.txt -c auth

4. Review staged modules

km --kb-path ./my_kb review

5. Serve through MCP

km --kb-path ./my_kb serve

Who should use this?

Teams that want inspectable, versioned knowledge artifacts in git.
Agent workflows that benefit from selective module loading via MCP.
Projects where maintainability and editorial control matter more than retrieval automation at massive scale.

Who should not use this?

Workloads that require large-scale semantic retrieval over tens of millions of words.
Systems already optimized around production embedding infrastructure.

Features

Keep knowledge Git-native and auditable: every approved module is JSON you can diff, review, and version with your repo.
Turn unstructured docs into reusable modules with clear sections (overview, details, examples, references, caveats).
Add a human checkpoint before publish: extract → staging → review → approve.
Use one workflow across models with provider support for DeepSeek (default deepseek-v4-pro), Claude, and OpenAI.
Reuse knowledge from editors and agents through MCP via knowledge://index, load_module, search_modules, and list_categories.
Keep retrieval responsive for hot modules with a thread-safe LRU cache.
Process long documents reliably with chunked extraction (chunk_size, chunk_overlap).
Operate with visibility through verbose CLI logs for provider/model choice, chunking, and extraction progress.
Ship with confidence: 75 tests across schema, storage, cache, LLM clients, MCP server, CLI, and integration layers.

Typical Use Cases

Build a shared team knowledge layer from product docs, runbooks, and incident writeups, then expose it to coding agents via MCP.
Replace copy-pasted prompt context with reviewed, versioned modules that can be searched and loaded on demand.
Keep architecture decisions and operational caveats close to code so AI-assisted workflows stay accurate over time.

Installation

git clone <repo>
cd knowledge-manager
poetry install

The km command is available after install via the entry point declared in pyproject.toml.

Detailed Setup

1. Initialize a knowledge base

km init ./my_kb

This creates:

my_kb/
├── index.json        # auto-maintained module index
├── config.json       # LLM provider + extraction config
└── .staging/         # pending modules awaiting review

2. Configure your LLM

Edit my_kb/config.json or use the CLI:

km --kb-path ./my_kb config set llm_providers.deepseek.api_key "sk-..."

The default provider is deepseek with model deepseek-v4-pro. Switch providers with:

km --kb-path ./my_kb config set extraction.provider claude

3. Extract modules from raw notes

km --kb-path ./my_kb add notes.txt -c auth

The LLM reads notes.txt, chunks it when needed, returns up to max_modules_per_extraction structured modules, and writes them to .staging/.

4. Review staged modules

km --kb-path ./my_kb review

For each staged module:

a — approve (move to KB and update index)
r — reject (delete from staging)
s — skip (leave in staging for later)

5. Browse and search

km --kb-path ./my_kb list                    # all modules
km --kb-path ./my_kb list -c auth            # filter by category
km --kb-path ./my_kb search "jwt token"      # ranked keyword search
km --kb-path ./my_kb show auth-jwt -c auth   # full module JSON
km --kb-path ./my_kb stats                   # KB statistics

Search ranks exact word matches first, then English stem matches, with partial matching as a fallback for short queries.

6. Serve as MCP

km --kb-path ./my_kb serve

This launches a stdio MCP server. Clients (Claude Code, etc.) see:

Resource knowledge://index — full index JSON
Tool load_module(module_id, category) — full module content
Tool search_modules(query) — ranked keyword search with exact, stem, and short-query partial matching
Tool list_categories() — categories with counts

Module schema

{
  "id": "auth-jwt",
  "category": "auth",
  "title": "JWT authentication in our API",
  "summary": "How JWT tokens are issued, signed (RS256), and validated.",
  "created_at": "2026-05-28T10:00:00Z",
  "updated_at": "2026-05-28T10:00:00Z",
  "content": {
    "overview": "...",
    "details": "...",
    "examples": "...",
    "references": "...",
    "caveats": "..."
  },
  "metadata": {
    "tags": ["auth", "jwt", "security"],
    "related_modules": ["auth/oauth-flow"],
    "confidence": "high",
    "source": "internal-runbook"
  }
}

id must match ^[a-z0-9-]+$. The full schema is in src/knowledge_manager/schemas.py.

Architecture

┌────────────┐      ┌──────────────┐
│  raw text  │─────▶│  Extractor   │  (LLM call)
└────────────┘      └───────┬──────┘
                            ▼
                       .staging/*.json
                            │
                       human review
                            ▼
            ┌────────────────────────────┐
            │   <category>/<id>.json     │
            │   index.json (auto)        │
            └──────────────┬─────────────┘
                           │
                  ┌────────┴────────┐
                  │                 │
                  ▼                 ▼
              CLI (km)         MCP server
                                 │
                      Claude Code / clients

Module	Responsibility
`schemas.py`	Pydantic models (Module, Index, Config, ...)
`storage.py`	Atomic file I/O, CRUD, staging, index rebuild
`cache.py`	Thread-safe LRU module cache
`llm_clients.py`	DeepSeek / Claude / OpenAI async clients
`extractor.py`	LLM-powered raw-text → module extraction
`mcp_server.py`	FastMCP server (resource + 3 tools)
`cli.py`	Click CLI (10 top-level commands plus `config` subcommands)

CLI reference

Command	Description
`km init [PATH]`	Initialize a knowledge base
`km list [-c CAT]`	List modules (optionally by category)
`km stats`	Show KB statistics
`km search QUERY`	Keyword search across modules
`km show ID -c CAT`	Show full module JSON
`km add FILE [-c CAT]`	Extract modules from FILE into staging
`km review`	Interactive review of staged modules
`km delete ID -c CAT [--yes]`	Delete a module
`km rebuild`	Rebuild `index.json` from on-disk modules
`km config {set,get,list}`	Manage `config.json`
`km serve`	Run MCP server over stdio

All commands accept a global --kb-path PATH (default: cwd).

Configuration

config.json example:

{
  "llm_providers": {
    "deepseek": {
      "api_key": "sk-...",
      "model": "deepseek-v4-pro",
      "base_url": "https://api.deepseek.com",
      "default": true,
      "temperature": 0.3,
      "max_tokens": 4096
    },
    "claude": {
      "api_key": "sk-ant-...",
      "model": "claude-sonnet-4-6",
      "default": false
    }
  },
  "extraction": {
    "provider": "deepseek",
    "max_modules_per_extraction": 10
  },
  "cache": {
    "enabled": true,
    "max_modules": 50
  }
}

config.json is gitignored — never commit API keys.

Logging

Use km --verbose ... to enable operational logging during CLI runs. Verbose logs include metadata such as provider name, model, chunk counts, module counts, and payload sizes, but they intentionally exclude raw note content, full prompts, LLM responses, API keys, and local file paths.

Development

poetry run pytest               # 75 tests
poetry run black src tests      # format
poetry run mypy src             # type check

Current validation artifacts are checked into test-results/ and docs/validation-report-2026-05-29.md. They cover MCP protocol compliance, retrieval behavior, and an end-to-end extract -> review -> serve run against a real sample knowledge base.

Example knowledge base

See examples/sample_knowledge_base/ for a small working KB you can copy as a starting point.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.2

Jun 10, 2026

0.5.1

Jun 10, 2026

This version

0.5.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledge_manager-0.5.0.tar.gz (51.6 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowledge_manager-0.5.0-py3-none-any.whl (53.5 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file knowledge_manager-0.5.0.tar.gz.

File metadata

Download URL: knowledge_manager-0.5.0.tar.gz
Upload date: Jun 9, 2026
Size: 51.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.14.3 Windows/11

File hashes

Hashes for knowledge_manager-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`058513ee1c1bdde2fe426201ce39dd611d8f84b1ce00caf678560f16a9f43868`
MD5	`3fe2b76ca9640d2c9baf0848922eeeb9`
BLAKE2b-256	`7196cd7889bd57865413623d18acbacb6f5ee7661b92a879c6bf8d7e21dc4790`

See more details on using hashes here.

File details

Details for the file knowledge_manager-0.5.0-py3-none-any.whl.

File metadata

Download URL: knowledge_manager-0.5.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 53.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.14.3 Windows/11

File hashes

Hashes for knowledge_manager-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`803a433f5d0b7ad027246a917c4e00d221ad971a6e24f5c22ddea92bc7e235b5`
MD5	`9748ad0f45b503be42546644f3fc2f39`
BLAKE2b-256	`3053aac2466f88e744c3154d9cca33ae1ccad1aef2728dfebfdf08939b1b5d3d`

See more details on using hashes here.

knowledge-manager 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Knowledge Manager

Why this approach?

Quick Start

1. Initialize a knowledge base

2. Configure your provider

3. Extract from notes

4. Review staged modules

5. Serve through MCP

Who should use this?

Who should not use this?

Features

Typical Use Cases

Installation

Detailed Setup

1. Initialize a knowledge base

2. Configure your LLM

3. Extract modules from raw notes

4. Review staged modules

5. Browse and search

6. Serve as MCP

Module schema

Architecture

CLI reference

Configuration

Logging

Development

Example knowledge base

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes