Config-driven knowledge graph framework for extracting structured knowledge from unstructured data
Project description
kvault
Agent-first knowledge graph framework. Build knowledge graphs from unstructured data using intelligent agents.
Philosophy
The agent IS the pipeline. Claude (or another LLM) does extraction, research, decisions, and propagation. kvault provides tools, not workflows.
┌─────────────────────────────────────────────────────────────┐
│ EntityIndex MatchStrategies ObservabilityLogger │
│ (fast lookup) (fuzzy, alias) (debug & improve) │
│ │
│ SimpleStorage (YAML frontmatter in _summary.md preferred) │
└─────────────────────────────────────────────────────────────┘
Agent (Claude) does:
- Read input
- Research (using EntityIndex + MatchStrategies)
- Decide (using its reasoning)
- Write (using SimpleStorage)
- Propagate (update parent summaries)
- Log (using ObservabilityLogger)
Getting Started with Claude Code
The fastest way to get a personal knowledge base running with Claude Code:
# 1. Install kvault with MCP support
pip install kvault[mcp]
# 2. Initialize a new knowledge base
kvault init my_kb --name "Your Name"
# 3. Verify it's clean
kvault check --kb-root my_kb
Then add the MCP server to .claude/settings.json:
{
"mcpServers": {
"kvault": {
"command": "kvault-mcp",
"env": {}
}
}
}
And add the integrity hook (catches stale summaries before each prompt):
{
"hooks": {
"UserPromptSubmit": [
{
"type": "command",
"command": "kvault check --kb-root /absolute/path/to/my_kb"
}
]
}
}
Customize the generated CLAUDE.md with your personal details, then start adding entities.
Installation
pip install kvault
Or install from source:
git clone https://github.com/cimo-labs/kvault
cd kvault
pip install -e .
Quick Start
from pathlib import Path
from kvault import (
EntityIndex,
SimpleStorage,
ObservabilityLogger,
EntityResearcher
)
# Initialize
kg_root = Path("my_knowledge_base")
index = EntityIndex(kg_root / ".kvault" / "index.db")
storage = SimpleStorage(kg_root)
logger = ObservabilityLogger(kg_root / ".kvault" / "logs.db")
researcher = EntityResearcher(index)
# 1. Research - find existing entities
matches = researcher.research("Alice Smith", email="alice@anthropic.com")
action, target, confidence = researcher.suggest_action("Alice Smith")
logger.log_research("Alice Smith", "alice smith",
[m.__dict__ for m in matches], action)
# 2. Decide - agent determines what to do
if action == "create":
entity_path = "people/collaborators/alice_smith"
logger.log_decide("Alice Smith", "create",
"No existing match found", confidence)
# 3. Write - create/update the entity
storage.create_entity(entity_path, {
"created": "2026-01-05",
"updated": "2026-01-05",
"source": "email:123",
"aliases": ["Alice", "alice@anthropic.com"]
}, summary="# Alice Smith\n\nResearch scientist at Anthropic.")
logger.log_write(entity_path, "create", "Created new entity")
# 4. Update index
index.add(entity_path, "Alice Smith",
["Alice", "alice@anthropic.com"], "people")
# 5. Propagate - update parent summaries
ancestors = storage.get_ancestors(entity_path)
logger.log_propagate(entity_path, ancestors)
Core Components
EntityIndex
SQLite-backed entity index with full-text search for fast lookups.
from kvault import EntityIndex
index = EntityIndex(Path("index.db"))
# Add entity
index.add("people/alice", "Alice Smith",
aliases=["Alice", "alice@example.com"],
category="people")
# Search
results = index.search("Alice")
# Find by alias
entry = index.find_by_alias("alice@example.com")
# Find by email domain
entries = index.find_by_email_domain("example.com")
# Rebuild from filesystem
count = index.rebuild(Path("knowledge_graph"))
SimpleStorage
Filesystem storage with minimal 4-field schema.
from kvault import SimpleStorage
storage = SimpleStorage(Path("knowledge_graph"))
# Create entity
storage.create_entity("people/alice", {
"created": "2026-01-05",
"updated": "2026-01-05",
"source": "manual",
"aliases": ["Alice"]
}, summary="# Alice\n\nDescription here.")
# Update entity
storage.update_entity("people/alice",
meta={"source": "email:123"},
summary="# Alice\n\nUpdated description.")
# Read
meta = storage.read_meta("people/alice")
summary = storage.read_summary("people/alice")
# Navigate hierarchy
ancestors = storage.get_ancestors("people/collaborators/alice")
# Returns: ["people/collaborators", "people"]
ObservabilityLogger
Phase-based logging for debugging and system improvement.
from kvault import ObservabilityLogger
logger = ObservabilityLogger(Path("logs.db"))
# Log phases
logger.log_input([{"name": "Alice"}], source="email")
logger.log_research("Alice", "alice", matches, "create")
logger.log_decide("Alice", "create", "No match found", confidence=0.95)
logger.log_write("people/alice", "create", "Created entity")
logger.log_propagate("people/alice", ["people"])
logger.log_error("validation_failed", entity="Alice",
details={"field": "email"})
# Query logs
errors = logger.get_errors()
decisions = logger.get_decisions(action="create")
low_conf = logger.get_low_confidence(threshold=0.7)
summary = logger.get_session_summary()
EntityResearcher
Research existing entities before creating new ones.
from kvault import EntityResearcher, EntityIndex
index = EntityIndex(Path("index.db"))
researcher = EntityResearcher(index)
# Find matches
matches = researcher.research("Alice Smith", email="alice@example.com")
# Get suggestion
action, path, confidence = researcher.suggest_action("Alice Smith")
# Returns: ("create", None, 0.95) or ("update", "people/alice", 0.90)
# Quick checks
exists = researcher.exists("Alice Smith", threshold=0.9)
best = researcher.best_match("Alice Smith")
Matching Strategies
Pluggable strategies for entity deduplication.
from kvault import (
AliasMatchStrategy,
FuzzyNameMatchStrategy,
EmailDomainMatchStrategy
)
# Alias matching - exact match (score: 1.0)
alias_strategy = AliasMatchStrategy()
# Fuzzy name matching (score: 0.85-0.99)
fuzzy_strategy = FuzzyNameMatchStrategy(threshold=0.85)
# Email domain matching (score: 0.85-0.95)
domain_strategy = EmailDomainMatchStrategy()
Storage Format
YAML Frontmatter (Preferred)
Entities are stored as a single _summary.md file with YAML frontmatter:
---
created: 2026-01-05
updated: 2026-01-05
source: email:123
aliases: [Alice, alice@anthropic.com, +14155551234]
phone: +14155551234
email: alice@anthropic.com
relationship_type: colleague
context: Met at NeurIPS 2024
---
# Alice Smith
Research scientist at Anthropic working on causal discovery.
## Background
Collaborator on interpretability project.
## Interactions
- 2026-01-05: Initial contact logged
## Notes
- Interested in causal representation learning
Required fields: created, updated, source, aliases
Optional fields: phone, email, relationship_type, context, related_to, last_interaction, status
Legacy Format (_meta.json)
Separate _meta.json files are still supported for backward compatibility:
{
"created": "2026-01-05",
"last_updated": "2026-01-05",
"sources": ["email:123"],
"aliases": ["Alice", "alice@anthropic.com"]
}
Note: New entities should use YAML frontmatter. The index rebuilder supports both formats.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black kvault/
# Type check
mypy kvault/
MCP Server (Claude Code Integration)
The kvault MCP server provides direct tool access for Claude Code, enabling the 6-step workflow without subprocess parsing.
Installation
pip install kvault[mcp] # Install with MCP support
Configuration
Add to .claude/settings.json:
{
"mcpServers": {
"kvault": {
"command": "kvault-mcp",
"env": {}
}
}
}
Available Tools
| Category | Tools |
|---|---|
| Init | kvault_init, kvault_status |
| Index | kvault_search, kvault_find_by_alias, kvault_find_by_email_domain, kvault_rebuild_index |
| Entity | kvault_read_entity, kvault_write_entity, kvault_list_entities, kvault_delete_entity, kvault_move_entity |
| Summary | kvault_read_summary, kvault_write_summary, kvault_get_parent_summaries |
| Research | kvault_research |
| Workflow | kvault_log_phase, kvault_write_journal, kvault_validate_transition |
Example Workflow
1. kvault_init(kg_root="/path/to/kb")
2. kvault_research(name="John Doe", phone="+14155551234")
3. kvault_write_entity(path="people/contacts/john_doe", meta={...}, content="...", create=true)
4. kvault_get_parent_summaries(path="people/contacts/john_doe")
5. kvault_write_summary(path="people/contacts", content="...")
6. kvault_write_journal(actions=[...], source="manual")
7. kvault_rebuild_index()
Benefits
- Structured JSON responses - No regex parsing of CLI output
- Direct control - Each tool call is explicit and debuggable
- Session state - Track workflow progress across calls
- No timeouts - Individual tools complete quickly
CLI Usage
pip install -e ".[dev]"
# Initialize a new KB
kvault init my_kb --name "Alice"
# Check KB integrity (propagation, journal, index, frontmatter, branching)
kvault check --kb-root my_kb
kvault check # Auto-detects KB root from cwd
# Process a corpus
kvault process --corpus /path/to/corpus --kg-root /path/to/kg --dry-run
kvault process --corpus /path/to/corpus --kg-root /path/to/kg --apply
# Rebuild and search the index
kvault index rebuild --kg-root /path/to/kg
kvault index search --db /path/to/kg/.kvault/index.db --query "Acme"
# Session summary (observability)
kvault log summary --db /path/to/kg/.kvault/logs.db
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowledgevault-0.3.0.tar.gz.
File metadata
- Download URL: knowledgevault-0.3.0.tar.gz
- Upload date:
- Size: 92.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40cc1194143d6048e5edfce75af84cba7f758aab2f7e13042da3c9e9c4f6776d
|
|
| MD5 |
30efbb73474b48a71fef2f7a839aed90
|
|
| BLAKE2b-256 |
e901aa4caa4446fe4f41d579ac1662b73be4691d8edb5d58d74fafcd18ae6235
|
Provenance
The following attestation bundles were made for knowledgevault-0.3.0.tar.gz:
Publisher:
publish.yml on cimo-labs/kvault
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledgevault-0.3.0.tar.gz -
Subject digest:
40cc1194143d6048e5edfce75af84cba7f758aab2f7e13042da3c9e9c4f6776d - Sigstore transparency entry: 925521196
- Sigstore integration time:
-
Permalink:
cimo-labs/kvault@649de75b9dd721d4643701c8381385fdc225f430 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cimo-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@649de75b9dd721d4643701c8381385fdc225f430 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file knowledgevault-0.3.0-py3-none-any.whl.
File metadata
- Download URL: knowledgevault-0.3.0-py3-none-any.whl
- Upload date:
- Size: 85.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b7a817f2b8460c7d051de776eb97f244999c0eb381feffafd68b7b8a68a12f7
|
|
| MD5 |
fd93eb7bb3c160aef57a0f75d49321e2
|
|
| BLAKE2b-256 |
ae0bed99b7eafa7c67510fd3b1fbf12c41bf1db6da7af765e2048e168e3c3658
|
Provenance
The following attestation bundles were made for knowledgevault-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on cimo-labs/kvault
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledgevault-0.3.0-py3-none-any.whl -
Subject digest:
6b7a817f2b8460c7d051de776eb97f244999c0eb381feffafd68b7b8a68a12f7 - Sigstore transparency entry: 925521198
- Sigstore integration time:
-
Permalink:
cimo-labs/kvault@649de75b9dd721d4643701c8381385fdc225f430 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cimo-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@649de75b9dd721d4643701c8381385fdc225f430 -
Trigger Event:
workflow_dispatch
-
Statement type: