Skip to main content

Shared types, graph store, semantic index, and pipeline base for the KGModule SDK

Project description

Python License: Elastic-2.0 Version CI Poetry

kgmodule-utils

kgmodule-utils — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.

Author: Eric G. Suchanek, PhD

Flux-Frontiers, Liberty TWP, OH


Overview

kgmodule-utils is the shared SDK layer for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.

Every KGModule implementation — PyCodeKG, DocKG, and others — subclasses KGModule from here and implements exactly three methods: make_extractor(), kind(), and analyze().


Features

  • kg_utils.specsNodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack dataclasses
  • kg_utils.extractorKGExtractor ABC: extract(), node_kinds(), edge_kinds(), coverage_metric()
  • kg_utils.storeGraphStore: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
  • kg_utils.semanticSemanticIndex (LanceDB), SentenceTransformerEmbedder, SeedHit, model registry, resolve_model_path()
  • kg_utils.pipelineKGModule: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
  • kg_utils.embedderget_embedder(), wrap_embedder(), load_sentence_transformer() factory functions
  • kg_utils.embedEmbedder protocol, DEFAULT_MODEL, KNOWN_MODELS, resolve_model_path()
  • kg_utils.snapshotsSnapshot, SnapshotManager, SnapshotManifest for temporal metric tracking

Installation

Requirements: Python ≥ 3.12, < 3.14

Core only (stdlib, no optional deps)

pip install kgmodule-utils

With semantic search (LanceDB + sentence-transformers)

pip install 'kgmodule-utils[semantic]'

In a Poetry project

[tool.poetry.dependencies]
kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }

Quick Start

Build a domain KG module

from collections.abc import Iterator
from pathlib import Path

from kg_utils.extractor import KGExtractor
from kg_utils.pipeline import KGModule
from kg_utils.specs import EdgeSpec, NodeSpec


class MyExtractor(KGExtractor):
    def node_kinds(self) -> list[str]:
        return ["document", "section"]

    def edge_kinds(self) -> list[str]:
        return ["CONTAINS"]

    def meaningful_node_kinds(self) -> list[str]:
        return ["section"]

    def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
        for doc in self.repo_path.glob("**/*.md"):
            doc_id = f"document:{doc}"
            yield NodeSpec(node_id=doc_id, kind="document",
                           name=doc.stem, qualname=doc.stem,
                           source_path=str(doc))
            # … yield sections and CONTAINS edges


class MyKG(KGModule):
    _default_dir = ".mykg"

    def make_extractor(self) -> KGExtractor:
        return MyExtractor(self.repo_root)

    def kind(self) -> str:
        return "my"

    def analyze(self) -> str:
        s = self.stats()
        return f"# MyKG\nnodes={s['total_nodes']}"


# Build and query
kg = MyKG("/path/to/repo")
kg.build(wipe=True)

result = kg.query("authentication flow", k=8, hop=1)
pack   = kg.pack("error handling", max_nodes=10)
print(pack.to_markdown())

Track metrics over time

from kg_utils.snapshots import SnapshotManager

mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")

snapshot = mgr.capture(
    version="1.0.0",
    branch="main",
    graph_stats_dict=kg.stats(),
)
mgr.save_snapshot(snapshot)

snaps = mgr.list_snapshots(limit=5)
delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])

API Reference

kg_utils.specs

Class Description
NodeSpec Graph node: node_id, kind, name, qualname, source_path, lineno, end_lineno, docstring, metadata
EdgeSpec Graph edge: source_id, target_id, relation, weight, metadata
BuildStats Build result: node/edge counts, indexed rows, embedding dim
QueryResult Query result: nodes, edges, seeds, hop, relevance metadata
SnippetPack Pack result: nodes with snippets, to_markdown(), to_json(), save()

kg_utils.extractor

Class Description
KGExtractor ABC — implement node_kinds(), edge_kinds(), extract()

kg_utils.store

Class Description
GraphStore SQLite persistence: write(), expand(), query_nodes(), resolve_symbols(), callers_of(), stats()

kg_utils.semantic

Class / function Description
SemanticIndex LanceDB vector index: build(), search()
SentenceTransformerEmbedder Local embedding via sentence-transformers
resolve_model_path() Resolve model name / alias to local cache path
suppress_ingestion_logging() Silence verbose HF / tqdm output during ingestion

kg_utils.pipeline

Class Description
KGModule Concrete base — implement make_extractor(), kind(), analyze(); get build(), query(), pack(), stats() for free

kg_utils.snapshots

Class Description
Snapshot Temporal snapshot keyed by git tree hash with metrics and deltas
SnapshotManager Capture, persist, load, list, diff, and prune snapshots
SnapshotManifest Fast-lookup index with format versioning

Project Structure

KG_utils/
├── pyproject.toml
├── src/
│   └── kg_utils/
│       ├── __init__.py
│       ├── specs.py          # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
│       ├── extractor.py      # KGExtractor ABC
│       ├── store.py          # GraphStore (SQLite)
│       ├── semantic.py       # SemanticIndex, SentenceTransformerEmbedder, SeedHit
│       ├── pipeline.py       # KGModule concrete base class
│       ├── module.py         # Re-export shim
│       ├── embed.py          # Embedder protocol, model registry
│       ├── embedder.py       # SentenceTransformerEmbedder factory functions
│       └── snapshots/
│           ├── __init__.py
│           ├── models.py     # Snapshot, SnapshotManifest, PruneResult
│           └── manager.py    # SnapshotManager
└── tests/
    ├── test_store.py          # GraphStore unit tests
    ├── test_pipeline_utils.py # Pipeline utility function tests
    ├── test_pipeline_module.py # End-to-end integration tests (--integration)
    ├── test_types.py          # Spec dataclass and KGExtractor tests
    ├── test_snapshots.py      # Snapshot lifecycle tests
    └── test_integration.py    # Cross-module integration tests

Development

git clone https://github.com/Flux-Frontiers/KG_utils.git
cd KG_utils
poetry install --with dev

Run the fast test suite (no model downloads):

poetry run pytest -m "not integration"

Run all tests including semantic/integration (requires [semantic] extra):

poetry run pytest

License

Elastic License 2.0 — see LICENSE.

Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgmodule_utils-0.4.0.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kgmodule_utils-0.4.0-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file kgmodule_utils-0.4.0.tar.gz.

File metadata

  • Download URL: kgmodule_utils-0.4.0.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kgmodule_utils-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e1735a2ca878daa60a2fa100c34215450894ba18b7e9e745d22b94f119470008
MD5 ea1d779c22db96371d88e85e89204789
BLAKE2b-256 d1452a98a5a51c1f0949d143da14c035e8d8bc0f866b3d0f54d61a4e5c97be8e

See more details on using hashes here.

File details

Details for the file kgmodule_utils-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: kgmodule_utils-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kgmodule_utils-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ee239a1b91c6b85bcb7199e7c335f2517d1c19962090ac523752e691f02baa6
MD5 88c4c70e88924552cfbf540d3cc07767
BLAKE2b-256 79f9537f73868a9fd2b528611312fa4f0dc0c38fff1a2befeaf8e738290a682f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page