A CLI that builds a knowledge graph from markdown files and exposes it via MCP
Project description
kgmd
A CLI that builds a knowledge graph from a directory of markdown files and exposes it via MCP.
- Extracts entities and relations using any LLM (via litellm)
- Resolves duplicate entities using local embeddings + LLM verification
- Induces a typed schema from the extracted data
- Stores everything in a single SQLite file (powered by sqlite-vec)
- Exposes the graph via CLI queries and an MCP server
Install
pip install kgmd
Or with uv:
uv tool install kgmd
Requirements
- Python 3.10+
- An API key for any LLM provider supported by litellm (OpenRouter, OpenAI, Anthropic, etc.)
- Embeddings run locally by default via fastembed (no API key needed)
Quickstart
# Initialize a corpus
cd my-notes/
kgmd init
# Set your LLM API key
export OPENROUTER_API_KEY="sk-..."
# Build the knowledge graph (extract -> resolve -> induce)
kgmd build
# Query
kgmd entities
kgmd relations
kgmd find "machine learning"
kgmd entity "Brian Anderson"
kgmd neighbors "Brian Anderson" --depth 2
kgmd path "Brian Anderson" "Acme Corp"
# Export
kgmd export --format graphml --output graph.graphml
# View induced schema
kgmd schema
# Corpus statistics
kgmd stats
How it works
kgmd build runs three stages:
- Extract -- Each markdown file is chunked and sent to an LLM, which returns structured JSON with entities (people, organizations, projects, etc.) and relations between them.
- Resolve -- Entity mentions are embedded locally, clustered by cosine similarity, and duplicate clusters are verified by the LLM before merging.
- Induce -- Aggregate statistics about entity types and relation predicates are sent to the LLM, which produces a typed YAML schema with hierarchies.
All state lives in .kgmd/graph.db, a single SQLite file. Re-running kgmd build is incremental -- unchanged files are skipped.
MCP Server
kgmd mcp launches an MCP server over stdio, exposing 7 tools:
| Tool | Description |
|---|---|
search |
Semantic search over chunks |
get_entity |
Full entity record with mentions and relations |
list_entities |
List entities, optionally filtered by type |
get_neighbors |
Subgraph traversal around an entity |
find_path |
Shortest path between two entities |
list_relations |
List relations with optional filters |
get_schema |
The current induced schema |
Claude Desktop setup
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"kgmd": {
"command": "kgmd",
"args": ["mcp"],
"cwd": "/path/to/your/corpus"
}
}
}
Configuration
Per-corpus config lives in .kgmd/config.yaml. Global defaults in ~/.config/kgmd/config.yaml (or the platform equivalent). Corpus config overrides global.
embedding:
backend: fastembed # or "litellm" for API embeddings
model: BAAI/bge-small-en-v1.5
llm:
model: openrouter/anthropic/claude-sonnet-4-5
temperature: 0.0
max_tokens: 4096
timeout_seconds: 120
chunking:
max_chars: 4000
overlap_chars: 200
split_on: paragraph # or "heading", "fixed"
extraction:
max_entities_per_chunk: 30
max_relations_per_chunk: 30
retry_on_parse_failure: 2
resolution:
similarity_threshold: 0.85
llm_verify_clusters: true
max_cluster_size: 10
induction:
include_attribute_summary: true
hierarchy_depth: 3
Export formats
kgmd export --format jsonld # JSON-LD with schema.org context
kgmd export --format cypher # Cypher CREATE statements (Neo4j)
kgmd export --format graphml # GraphML (Gephi, yEd)
Development
git clone https://github.com/2lines/kgmd.git
cd kgmd
pip install -e .
make test # run tests
make lint # ruff check
make format # ruff format
Note: Your Python must be built with SQLite extension loading enabled. If using pyenv:
LDFLAGS="-L$(brew --prefix sqlite)/lib" \
CPPFLAGS="-I$(brew --prefix sqlite)/include -DSQLITE_ENABLE_LOAD_EXTENSION" \
PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" \
pyenv install 3.12
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kgmd-0.1.0.tar.gz.
File metadata
- Download URL: kgmd-0.1.0.tar.gz
- Upload date:
- Size: 37.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dabc4ccf6d69a1c1d289bba7a0e43aa5b2323d683ec37fa5af977d305d504f80
|
|
| MD5 |
f1fe34e1da4c16e5d6b0e7e8e81631d3
|
|
| BLAKE2b-256 |
defc75b28da9db680f98c4759359decdc76327b7f22de3c0d2aab46c76926f22
|
Provenance
The following attestation bundles were made for kgmd-0.1.0.tar.gz:
Publisher:
publish.yml on johncarpenter/kgmd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kgmd-0.1.0.tar.gz -
Subject digest:
dabc4ccf6d69a1c1d289bba7a0e43aa5b2323d683ec37fa5af977d305d504f80 - Sigstore transparency entry: 1480349050
- Sigstore integration time:
-
Permalink:
johncarpenter/kgmd@d59c8f4295338a21f12a198df9a686c127a1ebab -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/johncarpenter
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d59c8f4295338a21f12a198df9a686c127a1ebab -
Trigger Event:
release
-
Statement type:
File details
Details for the file kgmd-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kgmd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4efd71b871065ea5e6613a81d22ee54356265aeed939ab64a5b21d004a46e76
|
|
| MD5 |
9f77e4acfe61330c74e125b4db4a33cb
|
|
| BLAKE2b-256 |
9341db677df0ba5434d9b7ed5f9b8270280f773e67977c5db388dd190e1d8540
|
Provenance
The following attestation bundles were made for kgmd-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on johncarpenter/kgmd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kgmd-0.1.0-py3-none-any.whl -
Subject digest:
a4efd71b871065ea5e6613a81d22ee54356265aeed939ab64a5b21d004a46e76 - Sigstore transparency entry: 1480349186
- Sigstore integration time:
-
Permalink:
johncarpenter/kgmd@d59c8f4295338a21f12a198df9a686c127a1ebab -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/johncarpenter
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d59c8f4295338a21f12a198df9a686c127a1ebab -
Trigger Event:
release
-
Statement type: