Skip to main content

A CLI that builds a knowledge graph from markdown files and exposes it via MCP

Project description

kgmd

A CLI that builds a knowledge graph from a directory of markdown files and exposes it via MCP.

  • Extracts entities and relations using any LLM (via litellm)
  • Resolves duplicate entities using local embeddings + LLM verification
  • Induces a typed schema from the extracted data
  • Stores everything in a single SQLite file (powered by sqlite-vec)
  • Exposes the graph via CLI queries and an MCP server

Install

pip install kgmd

Or with uv:

uv tool install kgmd

Requirements

  • Python 3.10+
  • An API key for any LLM provider supported by litellm (OpenRouter, OpenAI, Anthropic, etc.)
  • Embeddings run locally by default via fastembed (no API key needed)

Quickstart

# Initialize a corpus
cd my-notes/
kgmd init

# Set your LLM API key
export OPENROUTER_API_KEY="sk-..."

# Build the knowledge graph (extract -> resolve -> induce)
kgmd build

# Query
kgmd entities
kgmd relations
kgmd find "machine learning"
kgmd entity "Brian Anderson"
kgmd neighbors "Brian Anderson" --depth 2
kgmd path "Brian Anderson" "Acme Corp"

# Export
kgmd export --format graphml --output graph.graphml

# View induced schema
kgmd schema

# Corpus statistics
kgmd stats

How it works

kgmd build runs three stages:

  1. Extract -- Each markdown file is chunked and sent to an LLM, which returns structured JSON with entities (people, organizations, projects, etc.) and relations between them.
  2. Resolve -- Entity mentions are embedded locally, clustered by cosine similarity, and duplicate clusters are verified by the LLM before merging.
  3. Induce -- Aggregate statistics about entity types and relation predicates are sent to the LLM, which produces a typed YAML schema with hierarchies.

All state lives in .kgmd/graph.db, a single SQLite file. Re-running kgmd build is incremental -- unchanged files are skipped.

MCP Server

kgmd mcp launches an MCP server over stdio, exposing 7 tools:

Tool Description
search Semantic search over chunks
get_entity Full entity record with mentions and relations
list_entities List entities, optionally filtered by type
get_neighbors Subgraph traversal around an entity
find_path Shortest path between two entities
list_relations List relations with optional filters
get_schema The current induced schema

Claude Desktop setup

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "kgmd": {
      "command": "kgmd",
      "args": ["mcp"],
      "cwd": "/path/to/your/corpus"
    }
  }
}

Configuration

Per-corpus config lives in .kgmd/config.yaml. Global defaults in ~/.config/kgmd/config.yaml (or the platform equivalent). Corpus config overrides global.

embedding:
  backend: fastembed                    # or "litellm" for API embeddings
  model: BAAI/bge-small-en-v1.5

llm:
  model: openrouter/anthropic/claude-sonnet-4-5
  temperature: 0.0
  max_tokens: 4096
  timeout_seconds: 120

chunking:
  max_chars: 4000
  overlap_chars: 200
  split_on: paragraph                   # or "heading", "fixed"

extraction:
  max_entities_per_chunk: 30
  max_relations_per_chunk: 30
  retry_on_parse_failure: 2

resolution:
  similarity_threshold: 0.85
  llm_verify_clusters: true
  max_cluster_size: 10

induction:
  include_attribute_summary: true
  hierarchy_depth: 3

Export formats

kgmd export --format jsonld   # JSON-LD with schema.org context
kgmd export --format cypher   # Cypher CREATE statements (Neo4j)
kgmd export --format graphml  # GraphML (Gephi, yEd)

Development

git clone https://github.com/2lines/kgmd.git
cd kgmd
pip install -e .
make test    # run tests
make lint    # ruff check
make format  # ruff format

Note: Your Python must be built with SQLite extension loading enabled. If using pyenv:

LDFLAGS="-L$(brew --prefix sqlite)/lib" \
CPPFLAGS="-I$(brew --prefix sqlite)/include -DSQLITE_ENABLE_LOAD_EXTENSION" \
PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" \
pyenv install 3.12

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgmd-0.1.0.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kgmd-0.1.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file kgmd-0.1.0.tar.gz.

File metadata

  • Download URL: kgmd-0.1.0.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kgmd-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dabc4ccf6d69a1c1d289bba7a0e43aa5b2323d683ec37fa5af977d305d504f80
MD5 f1fe34e1da4c16e5d6b0e7e8e81631d3
BLAKE2b-256 defc75b28da9db680f98c4759359decdc76327b7f22de3c0d2aab46c76926f22

See more details on using hashes here.

Provenance

The following attestation bundles were made for kgmd-0.1.0.tar.gz:

Publisher: publish.yml on johncarpenter/kgmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kgmd-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kgmd-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kgmd-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4efd71b871065ea5e6613a81d22ee54356265aeed939ab64a5b21d004a46e76
MD5 9f77e4acfe61330c74e125b4db4a33cb
BLAKE2b-256 9341db677df0ba5434d9b7ed5f9b8270280f773e67977c5db388dd190e1d8540

See more details on using hashes here.

Provenance

The following attestation bundles were made for kgmd-0.1.0-py3-none-any.whl:

Publisher: publish.yml on johncarpenter/kgmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page