Skip to main content

Shared types, graph store, semantic index, and pipeline base for the KGModule SDK

Project description

Python License: Elastic-2.0 Version CI Poetry

kgmodule-utils

kgmodule-utils — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.

Author: Eric G. Suchanek, PhD

Flux-Frontiers, Liberty TWP, OH


Overview

kgmodule-utils is the shared SDK layer for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.

Every KGModule implementation — PyCodeKG, DocKG, and others — subclasses KGModule from here and implements exactly three methods: make_extractor(), kind(), and analyze().


Features

  • kg_utils.specsNodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack dataclasses
  • kg_utils.extractorKGExtractor ABC: extract(), node_kinds(), edge_kinds(), coverage_metric()
  • kg_utils.storeGraphStore: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
  • kg_utils.semanticSemanticIndex (LanceDB), SentenceTransformerEmbedder, SeedHit, model registry, resolve_model_path()
  • kg_utils.pipelineKGModule: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
  • kg_utils.embedderget_embedder(), wrap_embedder(), load_sentence_transformer() factory functions
  • kg_utils.embedEmbedder protocol, DEFAULT_MODEL, KNOWN_MODELS, resolve_model_path()
  • kg_utils.snapshotsSnapshot, SnapshotManager, SnapshotManifest for temporal metric tracking
  • kg_utils.synthesis — Unified text + image synthesis: oMLX, Ollama, and OpenAI text backends; mflux-local, mflux-serve, and DALL-E image backends; all env-var configurable

Installation

Requirements: Python ≥ 3.12, < 3.14

Core only (stdlib, no optional deps)

pip install kgmodule-utils

With semantic search (LanceDB + sentence-transformers)

pip install 'kgmodule-utils[semantic]'

With text + image synthesis (oMLX / Ollama / OpenAI / mflux-serve)

pip install 'kgmodule-utils[synthesis]'

With local mflux image generation (Apple Silicon, includes synthesis)

pip install 'kgmodule-utils[synthesis-mflux]'

In a Poetry project

[tool.poetry.dependencies]
kgmodule-utils = { version = ">=0.4.0", extras = ["semantic", "synthesis"] }

Quick Start

Build a domain KG module

from collections.abc import Iterator
from pathlib import Path

from kg_utils.extractor import KGExtractor
from kg_utils.pipeline import KGModule
from kg_utils.specs import EdgeSpec, NodeSpec


class MyExtractor(KGExtractor):
    def node_kinds(self) -> list[str]:
        return ["document", "section"]

    def edge_kinds(self) -> list[str]:
        return ["CONTAINS"]

    def meaningful_node_kinds(self) -> list[str]:
        return ["section"]

    def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
        for doc in self.repo_path.glob("**/*.md"):
            doc_id = f"document:{doc}"
            yield NodeSpec(node_id=doc_id, kind="document",
                           name=doc.stem, qualname=doc.stem,
                           source_path=str(doc))
            # … yield sections and CONTAINS edges


class MyKG(KGModule):
    _default_dir = ".mykg"

    def make_extractor(self) -> KGExtractor:
        return MyExtractor(self.repo_root)

    def kind(self) -> str:
        return "my"

    def analyze(self) -> str:
        s = self.stats()
        return f"# MyKG\nnodes={s['total_nodes']}"


# Build and query
kg = MyKG("/path/to/repo")
kg.build(wipe=True)

result = kg.query("authentication flow", k=8, hop=1)
pack   = kg.pack("error handling", max_nodes=10)
print(pack.to_markdown())

Track metrics over time

from kg_utils.snapshots import SnapshotManager

mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")

snapshot = mgr.capture(
    version="1.0.0",
    branch="main",
    graph_stats_dict=kg.stats(),
)
mgr.save_snapshot(snapshot)

snaps = mgr.list_snapshots(limit=5)
delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])

API Reference

kg_utils.specs

Class Description
NodeSpec Graph node: node_id, kind, name, qualname, source_path, lineno, end_lineno, docstring, metadata
EdgeSpec Graph edge: source_id, target_id, relation, weight, metadata
BuildStats Build result: node/edge counts, indexed rows, embedding dim
QueryResult Query result: nodes, edges, seeds, hop, relevance metadata
SnippetPack Pack result: nodes with snippets, to_markdown(), to_json(), save()

kg_utils.extractor

Class Description
KGExtractor ABC — implement node_kinds(), edge_kinds(), extract()

kg_utils.store

Class Description
GraphStore SQLite persistence: write(), expand(), query_nodes(), resolve_symbols(), callers_of(), stats()

kg_utils.semantic

Class / function Description
SemanticIndex LanceDB vector index: build(), search()
SentenceTransformerEmbedder Local embedding via sentence-transformers
resolve_model_path() Resolve model name / alias to local cache path
suppress_ingestion_logging() Silence verbose HF / tqdm output during ingestion

kg_utils.pipeline

Class Description
KGModule Concrete base — implement make_extractor(), kind(), analyze(); get build(), query(), pack(), stats() for free

kg_utils.snapshots

Class Description
Snapshot Temporal snapshot keyed by git tree hash with metrics and deltas
SnapshotManager Capture, persist, load, list, diff, and prune snapshots
SnapshotManifest Fast-lookup index with format versioning

kg_utils.synthesis

Full reference: docs/synthesis.md

Class / function Description
TextBackend Enum: omlx | ollama | openai
ImageBackend Enum: mflux-local | mflux-serve | openai
TextConfig Backend config dataclass with resolved_endpoint() / resolved_model()
ImageConfig Backend config dataclass with resolved_server_url() / resolved_model()
TextSynthesizer list_models(), synthesize_rag(), rewrite_for_image()
ImageSynthesizer generate() → PIL Image, generate_b64() → base64 PNG
text_config_from_env() Build TextConfig from SYNTH_* env vars
image_config_from_env() Build ImageConfig from IMAGE_* env vars
text_synthesizer_from_env() Convenience: config + synthesizer in one call
image_synthesizer_from_env() Convenience: config + synthesizer in one call

Project Structure

KG_utils/
├── pyproject.toml
├── docs/
│   └── synthesis.md          # Synthesis sub-package reference
├── src/
│   └── kg_utils/
│       ├── __init__.py
│       ├── specs.py          # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
│       ├── extractor.py      # KGExtractor ABC
│       ├── store.py          # GraphStore (SQLite)
│       ├── semantic.py       # SemanticIndex, SentenceTransformerEmbedder, SeedHit
│       ├── pipeline.py       # KGModule concrete base class
│       ├── module.py         # Re-export shim
│       ├── embed.py          # Embedder protocol, model registry
│       ├── embedder.py       # SentenceTransformerEmbedder factory functions
│       ├── snapshots/
│       │   ├── __init__.py
│       │   ├── models.py     # Snapshot, SnapshotManifest, PruneResult
│       │   └── manager.py    # SnapshotManager
│       └── synthesis/
│           ├── __init__.py   # Public API + factory functions
│           ├── _config.py    # TextBackend, ImageBackend, TextConfig, ImageConfig, env factories
│           ├── _text.py      # TextSynthesizer
│           └── _image.py     # ImageSynthesizer
└── tests/
    ├── test_store.py               # GraphStore unit tests
    ├── test_pipeline_utils.py      # Pipeline utility function tests
    ├── test_pipeline_module.py     # End-to-end integration tests (--integration)
    ├── test_types.py               # Spec dataclass and KGExtractor tests
    ├── test_snapshots.py           # Snapshot lifecycle tests
    ├── test_integration.py         # Cross-module integration tests
    ├── test_synthesis_config.py    # Config defaults and env-var priority chains (44 tests)
    ├── test_synthesis_text.py      # TextSynthesizer with mocked openai client (38 tests)
    └── test_synthesis_image.py     # ImageSynthesizer with mocked backends (34 tests)

Development

git clone https://github.com/Flux-Frontiers/KG_utils.git
cd KG_utils
poetry install --with dev

Run the fast test suite (no model downloads):

poetry run pytest -m "not integration"

Run all tests including semantic/integration (requires [semantic] extra):

poetry run pytest

License

Elastic License 2.0 — see LICENSE.

Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgmodule_utils-0.4.3.tar.gz (51.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kgmodule_utils-0.4.3-py3-none-any.whl (60.1 kB view details)

Uploaded Python 3

File details

Details for the file kgmodule_utils-0.4.3.tar.gz.

File metadata

  • Download URL: kgmodule_utils-0.4.3.tar.gz
  • Upload date:
  • Size: 51.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kgmodule_utils-0.4.3.tar.gz
Algorithm Hash digest
SHA256 2b83162ce4f74bf038d7f19397a10d98413d1a71f8bbdcd252a45953217dbc1f
MD5 b517cdc0cc22bbbb5b3a247bd5fc974a
BLAKE2b-256 fa67e60731674de7e0165793ac8a72fa65d7c2433e9f867438366030e12dd548

See more details on using hashes here.

File details

Details for the file kgmodule_utils-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: kgmodule_utils-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 60.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kgmodule_utils-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0c17214bee703161471bec163fba22a95d3dafde01af63ba5bce7c648253cd28
MD5 4ecf9dac8df0640957e15ca98c1f68fa
BLAKE2b-256 a4157a8a17d53b4973e56324dc57aec32d2a4b0eb03295a8c5f9b9a9fe4b109f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page