Config-driven knowledge graph framework for extracting structured knowledge from unstructured data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

elandesberg

These details have not been verified by PyPI

Project description

kvault

Agent-first knowledge graph framework. Build knowledge graphs from unstructured data using intelligent agents.

Philosophy

The agent IS the pipeline. Claude (or another LLM) does extraction, research, decisions, and propagation. kvault provides tools, not workflows.

┌─────────────────────────────────────────────────────────────┐
│  EntityIndex    MatchStrategies    ObservabilityLogger      │
│  (fast lookup)  (fuzzy, alias)     (debug & improve)        │
│                                                             │
│  SimpleStorage  (YAML frontmatter in _summary.md preferred) │
└─────────────────────────────────────────────────────────────┘

Agent (Claude) does:
  - Read input
  - Research (using EntityIndex + MatchStrategies)
  - Decide (using its reasoning)
  - Write (using SimpleStorage)
  - Propagate (update parent summaries)
  - Log (using ObservabilityLogger)

Getting Started with Claude Code

The fastest way to get a personal knowledge base running with Claude Code:

# 1. Install kvault with MCP support
pip install kvault[mcp]

# 2. Initialize a new knowledge base
kvault init my_kb --name "Your Name"

# 3. Verify it's clean
kvault check --kb-root my_kb

Then add the MCP server to .claude/settings.json:

{
  "mcpServers": {
    "kvault": {
      "command": "kvault-mcp",
      "env": {}
    }
  }
}

And add the integrity hook (catches stale summaries before each prompt):

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "type": "command",
        "command": "kvault check --kb-root /absolute/path/to/my_kb"
      }
    ]
  }
}

Customize the generated CLAUDE.md with your personal details, then start adding entities.

Installation

pip install kvault

Or install from source:

git clone https://github.com/cimo-labs/kvault
cd kvault
pip install -e .

Quick Start

from pathlib import Path
from kvault import (
    EntityIndex,
    SimpleStorage,
    ObservabilityLogger,
    EntityResearcher
)

# Initialize
kg_root = Path("my_knowledge_base")
index = EntityIndex(kg_root / ".kvault" / "index.db")
storage = SimpleStorage(kg_root)
logger = ObservabilityLogger(kg_root / ".kvault" / "logs.db")
researcher = EntityResearcher(index)

# 1. Research - find existing entities
matches = researcher.research("Alice Smith", email="alice@anthropic.com")
action, target, confidence = researcher.suggest_action("Alice Smith")
logger.log_research("Alice Smith", "alice smith",
                    [m.__dict__ for m in matches], action)

# 2. Decide - agent determines what to do
if action == "create":
    entity_path = "people/collaborators/alice_smith"
    logger.log_decide("Alice Smith", "create",
                      "No existing match found", confidence)

# 3. Write - create/update the entity
storage.create_entity(entity_path, {
    "created": "2026-01-05",
    "updated": "2026-01-05",
    "source": "email:123",
    "aliases": ["Alice", "alice@anthropic.com"]
}, summary="# Alice Smith\n\nResearch scientist at Anthropic.")
logger.log_write(entity_path, "create", "Created new entity")

# 4. Update index
index.add(entity_path, "Alice Smith",
          ["Alice", "alice@anthropic.com"], "people")

# 5. Propagate - update parent summaries
ancestors = storage.get_ancestors(entity_path)
logger.log_propagate(entity_path, ancestors)

Core Components

EntityIndex

SQLite-backed entity index with full-text search for fast lookups.

from kvault import EntityIndex

index = EntityIndex(Path("index.db"))

# Add entity
index.add("people/alice", "Alice Smith",
          aliases=["Alice", "alice@example.com"],
          category="people")

# Search
results = index.search("Alice")

# Find by alias
entry = index.find_by_alias("alice@example.com")

# Find by email domain
entries = index.find_by_email_domain("example.com")

# Rebuild from filesystem
count = index.rebuild(Path("knowledge_graph"))

SimpleStorage

Filesystem storage with minimal 4-field schema.

from kvault import SimpleStorage

storage = SimpleStorage(Path("knowledge_graph"))

# Create entity
storage.create_entity("people/alice", {
    "created": "2026-01-05",
    "updated": "2026-01-05",
    "source": "manual",
    "aliases": ["Alice"]
}, summary="# Alice\n\nDescription here.")

# Update entity
storage.update_entity("people/alice",
                      meta={"source": "email:123"},
                      summary="# Alice\n\nUpdated description.")

# Read
meta = storage.read_meta("people/alice")
summary = storage.read_summary("people/alice")

# Navigate hierarchy
ancestors = storage.get_ancestors("people/collaborators/alice")
# Returns: ["people/collaborators", "people"]

ObservabilityLogger

Phase-based logging for debugging and system improvement.

from kvault import ObservabilityLogger

logger = ObservabilityLogger(Path("logs.db"))

# Log phases
logger.log_input([{"name": "Alice"}], source="email")
logger.log_research("Alice", "alice", matches, "create")
logger.log_decide("Alice", "create", "No match found", confidence=0.95)
logger.log_write("people/alice", "create", "Created entity")
logger.log_propagate("people/alice", ["people"])
logger.log_error("validation_failed", entity="Alice",
                 details={"field": "email"})

# Query logs
errors = logger.get_errors()
decisions = logger.get_decisions(action="create")
low_conf = logger.get_low_confidence(threshold=0.7)
summary = logger.get_session_summary()

EntityResearcher

Research existing entities before creating new ones.

from kvault import EntityResearcher, EntityIndex

index = EntityIndex(Path("index.db"))
researcher = EntityResearcher(index)

# Find matches
matches = researcher.research("Alice Smith", email="alice@example.com")

# Get suggestion
action, path, confidence = researcher.suggest_action("Alice Smith")
# Returns: ("create", None, 0.95)  or  ("update", "people/alice", 0.90)

# Quick checks
exists = researcher.exists("Alice Smith", threshold=0.9)
best = researcher.best_match("Alice Smith")

Matching Strategies

Pluggable strategies for entity deduplication.

from kvault import (
    AliasMatchStrategy,
    FuzzyNameMatchStrategy,
    EmailDomainMatchStrategy
)

# Alias matching - exact match (score: 1.0)
alias_strategy = AliasMatchStrategy()

# Fuzzy name matching (score: 0.85-0.99)
fuzzy_strategy = FuzzyNameMatchStrategy(threshold=0.85)

# Email domain matching (score: 0.85-0.95)
domain_strategy = EmailDomainMatchStrategy()

Storage Format

YAML Frontmatter (Preferred)

Entities are stored as a single _summary.md file with YAML frontmatter:

---
created: 2026-01-05
updated: 2026-01-05
source: email:123
aliases: [Alice, alice@anthropic.com, +14155551234]
phone: +14155551234
email: alice@anthropic.com
relationship_type: colleague
context: Met at NeurIPS 2024
---

# Alice Smith

Research scientist at Anthropic working on causal discovery.

## Background
Collaborator on interpretability project.

## Interactions
- 2026-01-05: Initial contact logged

## Notes
- Interested in causal representation learning

Required fields: created, updated, source, aliases Optional fields: phone, email, relationship_type, context, related_to, last_interaction, status

Legacy Format (_meta.json)

Separate _meta.json files are still supported for backward compatibility:

{
  "created": "2026-01-05",
  "last_updated": "2026-01-05",
  "sources": ["email:123"],
  "aliases": ["Alice", "alice@anthropic.com"]
}

Note: New entities should use YAML frontmatter. The index rebuilder supports both formats.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black kvault/

# Type check
mypy kvault/

MCP Server (Claude Code Integration)

The kvault MCP server provides direct tool access for Claude Code, enabling the 6-step workflow without subprocess parsing.

Installation

pip install kvault[mcp]  # Install with MCP support

Configuration

Add to .claude/settings.json:

{
  "mcpServers": {
    "kvault": {
      "command": "kvault-mcp",
      "env": {}
    }
  }
}

Available Tools

Category	Tools
Init	`kvault_init`, `kvault_status`
Index	`kvault_search`, `kvault_find_by_alias`, `kvault_find_by_email_domain`, `kvault_rebuild_index`
Entity	`kvault_read_entity`, `kvault_write_entity`, `kvault_list_entities`, `kvault_delete_entity`, `kvault_move_entity`
Summary	`kvault_read_summary`, `kvault_write_summary`, `kvault_get_parent_summaries`
Research	`kvault_research`
Workflow	`kvault_log_phase`, `kvault_write_journal`, `kvault_validate_transition`

Example Workflow

1. kvault_init(kg_root="/path/to/kb")
2. kvault_research(name="John Doe", phone="+14155551234")
3. kvault_write_entity(path="people/contacts/john_doe", meta={...}, content="...", create=true)
4. kvault_get_parent_summaries(path="people/contacts/john_doe")
5. kvault_write_summary(path="people/contacts", content="...")
6. kvault_write_journal(actions=[...], source="manual")
7. kvault_rebuild_index()

Benefits

Structured JSON responses - No regex parsing of CLI output
Direct control - Each tool call is explicit and debuggable
Session state - Track workflow progress across calls
No timeouts - Individual tools complete quickly

CLI Usage

pip install -e ".[dev]"

# Initialize a new KB
kvault init my_kb --name "Alice"

# Check KB integrity (propagation, journal, index, frontmatter, branching)
kvault check --kb-root my_kb
kvault check                      # Auto-detects KB root from cwd

# Process a corpus
kvault process --corpus /path/to/corpus --kg-root /path/to/kg --dry-run
kvault process --corpus /path/to/corpus --kg-root /path/to/kg --apply

# Rebuild and search the index
kvault index rebuild --kg-root /path/to/kg
kvault index search --db /path/to/kg/.kvault/index.db --query "Acme"

# Session summary (observability)
kvault log summary --db /path/to/kg/.kvault/logs.db

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

elandesberg

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.10.0

May 3, 2026

0.9.0

May 3, 2026

0.8.0

May 2, 2026

0.7.1

Feb 28, 2026

0.7.0

Feb 26, 2026

0.6.2

Feb 17, 2026

0.6.0

Feb 14, 2026

0.5.0

Feb 7, 2026

0.4.0

Feb 7, 2026

This version

0.3.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledgevault-0.3.0.tar.gz (92.2 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowledgevault-0.3.0-py3-none-any.whl (85.4 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file knowledgevault-0.3.0.tar.gz.

File metadata

Download URL: knowledgevault-0.3.0.tar.gz
Upload date: Feb 6, 2026
Size: 92.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for knowledgevault-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`40cc1194143d6048e5edfce75af84cba7f758aab2f7e13042da3c9e9c4f6776d`
MD5	`30efbb73474b48a71fef2f7a839aed90`
BLAKE2b-256	`e901aa4caa4446fe4f41d579ac1662b73be4691d8edb5d58d74fafcd18ae6235`

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledgevault-0.3.0.tar.gz:

Publisher: publish.yml on cimo-labs/kvault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: knowledgevault-0.3.0.tar.gz
- Subject digest: 40cc1194143d6048e5edfce75af84cba7f758aab2f7e13042da3c9e9c4f6776d
- Sigstore transparency entry: 925521196
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: cimo-labs/kvault@649de75b9dd721d4643701c8381385fdc225f430
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cimo-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@649de75b9dd721d4643701c8381385fdc225f430
- Trigger Event: workflow_dispatch

File details

Details for the file knowledgevault-0.3.0-py3-none-any.whl.

File metadata

Download URL: knowledgevault-0.3.0-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 85.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for knowledgevault-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b7a817f2b8460c7d051de776eb97f244999c0eb381feffafd68b7b8a68a12f7`
MD5	`fd93eb7bb3c160aef57a0f75d49321e2`
BLAKE2b-256	`ae0bed99b7eafa7c67510fd3b1fbf12c41bf1db6da7af765e2048e168e3c3658`

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledgevault-0.3.0-py3-none-any.whl:

Publisher: publish.yml on cimo-labs/kvault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: knowledgevault-0.3.0-py3-none-any.whl
- Subject digest: 6b7a817f2b8460c7d051de776eb97f244999c0eb381feffafd68b7b8a68a12f7
- Sigstore transparency entry: 925521198
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: cimo-labs/kvault@649de75b9dd721d4643701c8381385fdc225f430
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cimo-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@649de75b9dd721d4643701c8381385fdc225f430
- Trigger Event: workflow_dispatch

knowledgevault 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

kvault

Philosophy

Getting Started with Claude Code

Installation

Quick Start

Core Components

EntityIndex

SimpleStorage

ObservabilityLogger

EntityResearcher

Matching Strategies

Storage Format

YAML Frontmatter (Preferred)

Legacy Format (_meta.json)

Development

MCP Server (Claude Code Integration)

Installation

Configuration

Available Tools

Example Workflow

Benefits

CLI Usage

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance